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~Nrhis  report  describes  a  pilot  study  on  the  development  and  administration 
of  a  test  using  a  spatial  reasoning  problem,  the  15-puzzle.  The  test  utilized 
the  on-line  capabilities  of  a  real-time  computer  (1)  to  record  an  examinee's 
progress  on  each  problem  through  a  sequence  of  problem-solving  "moves"  and  (2) 
to  collect  additional  on-line  data  that  might  be  of  relevance  to  the  evalua¬ 
tion  of  examinee  performance  (e.g.,  number  of  illegal  and  repeated  moves,  re¬ 
sponse  latency  trends).  The  examinees,  61  students  In  an  introductory  pay-  ' 


etiology  class,  were  required  Co  type  a  sequence  of  moves  chat  would  bring  one 
4x4  array  of  scrambled  numbers  (start  configuration)  into  agreement  with  a 
second  4x4  array  (goal  configuration) ,  using  as  few  moves  as  possible. 

Data  analyses  emphasized  the  comparison  of  several  methods  of  indexing  prob¬ 
lem  difficulty,  methods  of  scoring  individual  performance,  and  the  relation¬ 
ship  between  response  latency  data,  performance,  and  problem-solving  strategy. 

Subjective  ratings  of  the  perceived  difficulty  of  replications  of  the  15- 
puzzle  were  obtained  from  a  separate  student  sample  to  Investigate  (1)  the  sub¬ 
jective  dimensions  used  by  students  in  evaluating  the  difficulty  of  this  prob¬ 
lem  type,  (2)  how  accurately  the  actual  performance  difficulty  of  these  prob¬ 
lems  could  be  evaluated  by  students,  and  (3)  whether  there  were  reliable  indi¬ 
vidual  differences  in  difficulty  perceptions  related  to  actual  performance 
differences. 

Results /of  the  study  suggested  that  four  performance  indices  might  be  use¬ 
ful  in  indexing  problem  difficulty:  (1)  mean  number  of  moves  in  the  sample,  (2) 
proportion  of  students  solving  the  problem,  (3)  proportion  of  students  solving 
the  problem  in  the  optimal  number  of  moves,  and  (4)  a  Special  Difficulty  Index, 
defined  as  the  sample  mean  number  of  moves  divided  by  the  minimum  number  of 
moves  required.  Four  alternative  methods  of  scoring  total  test  performance  and 
two  methods  of  scoring  individual  problem  performance  were  studied.  The  scores 
that  took  into  account  differencial  numbers  of  moves  between  the  optimal  and 
maximum  number  allowed  were  related  somewhat  more  to  performance  ratings  ob¬ 
tained  from  independent  judges. 

Examination  of  problem  performance  Indices,  the  Special  Difficulty  Index, 
and  students'  perceptions  of  the  difficulty  of  the  test  problems  indicated  that 
most  of  the  problems  were  too  easy  for  most  students.  However,  the  possibility 
of  obtaining  a  more  discriminating  subset  of  problems  was  suggested  by  item- 
total  score  correlations  obtained  for  each  problem.  The  data  suggested  that  I 
better  consistency  might  be  obtained  using  problems  of  similar  difficulty  lev¬ 
els,  and  it  was  hypothesized  that  an  adaptive  test  tailoring  problems  to  the 
ability  level  of  each  student  would  increase  the  reliability  of  measurement. 

Mean  initial  and  total  "move"  latencies  for  each  problem  were  strongly  re¬ 
lated  to  some  of  the  performance  Indices  of  problem  difficulty.  At  the  level 
of  individual  performance,  only  total  latency  or  problem  solution  time  was  re¬ 
lated  to  problem  performance.  Latency  data  appeared  to  confound  differences  in 
the  ability  to  visualize  a  sequence  of  moves  and  differences  in  students'  work 
styles.  Strong  evidence  for  these  work  styles  was  found  in  student  consistency 
of  Initial,  average,  and  total  response  latency  measures  across  all  problems. 

Perceived  difficulty  ratings  showed  reliable  individual  differences  in  the 
level  and  variability  of  difficulty  perceptions.  The  data  suggested  that  the 
individual  differences  found  were  related  to  individual  differences  in  ability 
to  visualize  and  to  maintain  a  sequence  of  moves  in  short-term  memory.  It  was 
concluded  that  an  adequate  selection  of  problem  replications  should  be  able  to 
tap  these  differences,  resulting  in  reliable  solution  performance  differences. 

Improvements  in  problem  selection  and  design  were  suggested  by  the  data  in! 
this  study.  Future  tests  of  this  type  should  consist  of  fewer  but  more  diffi¬ 
cult  problems,  particularly  problems  not  permitting  reactive,  impulsive  solu¬ 
tions.  This  type  of  test  would  seem  especially  appropriate  for  adaptive  ad¬ 
ministration:  (1)  scores  on  problems  tailored  to  the  individual's  ability 
would  likely  be  more  highly  related  to  each  other,  resulting  in  more  highly  re¬ 
liable  total  scores;  (2)  the  motivational  aspects  of  the  tests,  which  seem  more 
taxing  and  potentially  frustrating  chan  conventional  item  formats,  would  likely 
be  improved,  and  (3)  for  most  testees  equally  precise  measurements  could  be 
obtained  in  shorter  periods  of  time  than  with  conventional  test  administration. 
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Interactive  Computer  Administration 
of  a  Spatial  Reasoning  Test 


Most  research  on  computer-administered  testing  has  emphasized  the 
ability  of  the  coaputer  to  adapt  itea  difficulties  to  the  ability  level 
of  ezaainees.  Such  coaputerized  adaptive  tests  have  been  shown  to  pro¬ 
vide  aore  equiprecise  aeasureaent  across  all  trait  levels  (e.g.,  Tale, 
1975i  Vale  &  Weiss,  1975),  to  provide  generally  higher  test-retest  sta¬ 
bilities  than  conventional  tests  (e.g.,  Betz  &  Weiss,  1973,  1975),  and 
to  result  in  tests  of  fever  iteas  while  achieving  the  saae  or  higher 
levels  of  aeasureaent  accuracy  (Weiss  &  Betz,  1973),  In  addition,  re¬ 
search  has  Indicated  that  iaaediate  knowledge  of  results  adainistered 
to  testees  after  each  itea  in  coaputer-adainistered  tests  results  in 
enhanced  perforaance  (Betz  &  Weiss,  1976a)  and  favorable  psychological 
effects  for  ezaainees  (Betz  &  Weiss,  1976b).  Research  with  computer 
administration  of  a  concept  attalnaent  task  (Johnson  &  Baker,  1972) 
Indicated  that  iaproved  standardization  could  also  be  obtained  with 
coaputer  test  administration;  and  the  results  of  Johnson  and  Mihal 
(1973)  and  ?lne.  Church,  Glalluca,  and  Weiss  (1979)  indicated  that  dif¬ 
ferences  in  aean  peforaance  of  racial  groups  might  be  reduced  or  elimi¬ 
nated  with  computer-administered  testing. 

Almost  all  of  the  research  on  computer-administered  testing  has 
measured  intellectual  abilities  and  utilized  item  types  that  are  conve¬ 
niently  measured  by  conventional  paper-and -pencil  tests  as  well.  How¬ 
ever,  coaputers  would  seem  to  be  especially  useful  in  measuring  various 
perceptual,  memory,  and  problem-solving  abilities  that  utilize  the  com¬ 
puter's  capabilities  to  present  novel  itea  formats,  modifying  item  pre¬ 
sentation  over  time  in  response  to  the  examinee's  perforaance  and  al¬ 
lowing  the  computer  to  interact  with  the  student  while  working  on  a 
task.  It  is  of  interest  to  determine  whether  the  advantages  previously 
found  for  coaputer-administered  tests,  particularly  in  an  adaptive 
aode,  can  be  extended  to  tests  of  new  abilities  that  make  fuller  use  of 
the  unique  capabilities  of  the  interactive  computer. 

Although  the  use  of  coaputers  to  control  the  presentation  of 
visual  stimuli  on  a  cathode-ray- tube  (CRT)  is  fairly  common  in  psycho¬ 
logical  research,  aost  of  this  research  has  been  concerned  with  the 
discovery  of  processes  of  attention,  memory,  and  perception  that  apply 
to  all  Individuals.  Recently,  however,  investigators  have  begun  to 
explore  the  potential  of  computer-administered  tests  for  measuring  in¬ 
dividual  differences  in  various  cognitive  abilities.  For  example,  Cory 
(1977;  Cory,  Rimland,  &  Bryson,  1977)  has  developed  tests  for  five 
abilities — short-term  memory,  perceptual  speed,  perceptual  closure, 
movement  detection,  and  dealing  with  concepts/  information— and  com¬ 
pared  scores  on  these  tests  to  conventional  paper-and-pencil  tests  of 
comparable  abilities.  The  conclusion  was  that  these  tests  provided 
measures  of  attributes  that  are  different  from  those  measured  by  paper- 
and-pencil  tests.  For  example,  a  "sequential  reasoning  dimension, 
which  did  not  appear  in  the  paper-and-pencil  tests,  was  identified  in 
the  computerized  tests.  Computer  test  administration  is  also  being 


Increasingly  used  by  psychologists  interested  in  measuring  Individual 
differences  in  various  basic  information  processing  abilities  (e.g., 
Chiang  &  Atkinson,  1976;  Hunt,  Lunneborg,  &  Lewis,  1975?  Hose,  1978). 

A  common  characteristic  of  such  new  ability  tests  is  that  tradi¬ 
tional  psychometric  indices  of  individual  performance  (such  as  number- 
correct  scores)  and  item  characteristics  (such  as  item  difficulty  and 
item  discrimination)  may  no  longer  be  meaningful.  To  measure  individu¬ 
al  differences  in  examinee  performance,  researchers  have  used  scores 
derived  from  reaction  time  data;  slope  and  intercept  parameters  relat¬ 
ing  reaction  time  to  memory  set  size  (Sternberg,  1969);  component 
scores  on  various  stages  or  subprocesses  derived  from  hypothesized 
models  (e.g.,  Clark  &  Chase,  1972);  and  parameter  scores  (X)',  beta) 
derived  from  signal  detection  theory.  Some,  but  not  all,  researchers 
using  such  measures  of  Individual  differences  have  attempted  to  demon¬ 
strate  the  psychometric  characteristics  (e.g.,  reliability)  of  these 
new  performance  indices.  Such  a  demonstration  is  necessary,  however, 
for  each  new  score  derived  from  new  types  of  ability  tests  before  the 
validity  and  utility  of  the  scores  can  be  investigated. 

Purpose 

This  report  describes  a  pilot  study  reporting  the  development  and 
administration  of  a  spatial  reasoning  problem,  the  15-puzzle,  which 
utilized  the  on-line  capabilities  of  a  real-time  computer  to  record  a 
testee's  progress  on  each  problem  throughout  a  sequence  of  "moves"  and 
to  collect  additional  on-line  data  that  might  be  of  relevance  to  the 
evaluation  of  testee  performance.  Although  spatial  ability  has  been 
shown  to  be  an  important  special  ability  predictive  of  some  joh  crite¬ 
ria  (for  a  summary  of  predictive  validities  for  various  occupational 
areas  between  1920  and  1971,  see  Ghiselli,  1973),  it  was  also  hoped 
that  this  problem  type  and  others  to  be  developed  would  be  able  to  tap 
generalized  problem-solving  and  reasoning  abilities. 

The  15-puzzle  problem  used  in  this  study  involved  presentation  of 
the  numbers  1  to  15  in  a  4  x  4  matrix  of  scrambled  numbers  and  in  a 
target  matrix  with  the  numbers  in  another  configuration.  The  testee 
was  required  to  move  the  numbers  in  the  first  configuration,  one  number 
at  a  time,  to  match  the  second  configuration.  This  problem  type  was 
chosen  because  it  seemed  to  tap  abilities  important  in  problem-solving 
situations,  especially  in  the  spatial  domain,  while  providing  the  fol¬ 
lowing  additional  advantages: 

1.  Utilization  of  the  unique  capabilities  of  interactive  comput¬ 
ers  . 

2.  The  existence  of  a  well-defined  optimal  solution  against  which 
to  evaluate  a  student's  performance. 

3.  The  ease  of  generating  large  numbers  of  replications  of  vary¬ 
ing  and  relatively  controllable  difficulty  levels. 

If  the  advantages  of  computerized  adaptive  testing  are  to  be  ap¬ 
plied  to  tests  of  this  type,  precise  Indices  of  individual  performance 
and  problem  difficulty  must  be  devised.  Thus,  an  important  emphasis  in 
this  study  was  on  a  comparison  of  alternative  methods  for  quantifying 
student  performance  and  a  comparison  of  alternative  Indices  of  problem 
difficulty  for  the  15-puzzle  spatial  reasoning  problem.  For  example, 
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the  number  of  mores  a  student  requires  to  solre  replications  of  the 
15-puzzle  may  not  he  an  adequate  index  of  problem  performance  where  the 
minimum  number  of  mores  for  rarlous  problems  differs.  Some  of  the 
questions  studied  were*  Is  the  minimum  number  of  mores  to  solution  a 
meaningful  index  of  problem  difficulty*  or  do  other  physical  aspects  of 
the  puzzle  configuration  Influence  problem  difficulty  as  well?  Can  re¬ 
sponse  latencies  be  used  to  quantify  difficulty  and/or  indiridual  per¬ 
formance?  In  addition*  to  determine  whether  or  not  the  15-puzzle  task 
could  be  used  to  successfully  measure  problem  solring  in  the  spatial 
domain*  the  reliability  of  indiridual  performance  scores  across  prob¬ 
lems  of  similar  and  rarying  difficulty  lerels  was  examined. 

One  further  adrantage  of  the  problem  type  studied  here  may  be  its 
interactire  game  format,  which  may  prore  to  be  more  motiratlng  to  exam¬ 
inees  than  the  usual  separate  item  format.  In  addition*  the  prorislon 
of  knowledge  of  results  may  be  a  built-in  feature  of  these  problems, 
since  the  students  can  tell  when  they  hare  reached  a  solution.  On  the 
other  hand,  the  need  for  persererance  and  the  possibly  greater  poten¬ 
tial  for  frustration  and  anxiety  with  this  type  of  problem  must  also  be 
considered.  Thus,  motiratlonal  data  were  collected  and  examined  in 
this  study  in  an  attempt  to  draw  some  preliminary  conclusions  about  the 
psychological  effects  of  working  on  such  problems. 

To  a  large  degree,  the  psychological  effects  of  problems  of  this 
type  on  examinees  will  depend  on  the  perceived  difficulty  of  replica¬ 
tions  of  the  problems.  It  would  seem  that  problems  of  this  type  that 
are  inappropriate  for  the  student's  ability  level  may  be  even  more  dis¬ 
couraging  than  the  typical  conventional  test  item  because  the  student 
cannot  merely  guess  and  continue  with  the  next  item.  In  problems  of 
this  type,  guessing  becomes  not  a  response  bias  to  be  eliminated  but  a 
trial-and-error  strategy  on  the  part  of  the  examinee.  Thus,  eventual 
adaptation  of  problems  to  the  student's  ability  level  may  be  especially 
important  for  making  the  testing  experience  reasonably  pleasant  and 
nonfrustrating. 

However,  whether  an  adaptive  presentation  of  problems  can  actually 
equalize  the  psychological  effects  of  such  a  test  will  depend  largely 
on  whether  students  can  accurately  perceive  the  difficulties  of  the 
items  administered  (Prestwood  &  Weiss,  1977),  Even  though  some  previ¬ 
ous  research  has  found  agreement  between  perceived  and  objective  indi¬ 
ces  of  item  difficulty  (e.g.,  Bratfish,  Dornic,  &  Borg,  1972;  Munz  & 
Jacobs,  1971 ;  Prestwood  &  Weiss,  1977),  it  would  seem  necessary  to 
answer  this  question  anew  when  item  or  problem  types  differ  signifi¬ 
cantly.  The  present  study,  therefore,  reports  some  preliminary  data 
relating  to  the  similarity  of  objective  and  perceived  indices  of 
problem  difficulty  for  replications  of  the  15-puzzle. 
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ode-ray-tube  (CRT)  display  terminal.  The  sequence  of  problem  presenta¬ 
tions  and  the  simultaneous  collection  of  performance  data  vere  con¬ 
trolled  by  a  computer  program  written  for* a  Hewlett-Packard  real-time 
minicomputer . 

Figure  1  shows  a  sample  of  the  display  presented  on  the  CRT  screen 
while  the  student  worked  on  each  problem.  As  Figure  1  shows,  the  stu¬ 
dent  was  instructed  to  type  a  three-character  "move"  on  the  terminal 
keyboard  specifying  which  number  in  the  left  pattern  he  or  she  wished 
to  move  left,  right,  up,  or  down  one  square  in  an  attempt  to  eventually 
bring  the  configuration  of  numbers  in  the  left  pattern  into  agreement 
with  the  pattern  of  numbers  on  the  right. 


Figure  1 

Sample  15-Puzzle  Problem 

Make  your  "moves”  in  this  pattern  Try  to  match  this  pattern 


10  9  3  7 

4  8  6 

12  5  2  14 

1  11  15  13 


10  2  9  7 

12  8  6 
5  4  3  14 

1  11  15  13 


Enter  your  move  by  typing  three  characters  and  the  "RETURN”  key. 


The  first  two  characters  should  be  the  number  you  want  to  move. 

If  the  number  has  only  one  digit,  type  one  space  and  then  the  one 
digit  number. 


The  third  character  should  be: 

I  -  if  you  want  to  move  the  number  one  square  to  the  left. 

R  -  if  you  want  to  move  the  number  one  square  to  the  right. 

U  -  if  you  want  to  move  the  number  up  one  square. 

E  -  if  you  want  to  move  the  number  down  one  square. 


After  each  three-character  move  was  typed,  the  computer  processed 
the  move  for  legality.  If  the  move  was  legal,  the  pattern  on  the  left 
was  updated  immediately  using  a  cursor  addressing  system,  which  allowed 
specified  screen  locations  to  be  manipulated  without  rewriting  the 
entire  screen.  If  the  three-character  move  was  illegal,  an  explanatory 
error  message  was  displayed,  and  in  some  cases  the  student  was  in¬ 
structed  to  notify  the  test  proctor  for  assistance.  The  testing  pro¬ 
gram  detected  illegal  moves  of  both  a  syntactical  (e.g.,  typing  errors) 
and  a  logical  (e.g.,  trying  to  move  a  number  into  an  already  occupied 
square  or  beyond  the  cuter  edge  of  the  pattern)  nature.  Appendix  A 
contains  a  complete  list  of  diagnostic  error  messages  utilized  by  the 
testing  program. 

£§ll2IE3S2§  data.  While  the  student  worked  on  the  problem,  the 
following  data  were  collected  on-line  by  the  computer: 

1.  Whether  the  problem  was  solved  or  not,  i.e.,  whether  the  stu¬ 
dent  was  able  to  type  a  sequence  of  moves  that  would  make  the 
configuration  on  the  left  match  the  configuration  on  the  right. 
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2.  The  number  of  moves  required  for  solution. 

3.  The  number  of  illegal  moves,  including  impossible  moves  of 
both  a  syntactical  and  a  configural  nature. 

4.  The  number  of  repeated  moves,  i.e.,  how  many  times  the  student 

backed  up,  or  reversed  a  possibly  incorrect  sequence  of 
moves  to  return  to  an  earlier  pattern  configuration. 

5.  Response  latencies,  i.e.,  the  time  in  seconds  required  for 
each  move. 

6.  The  actual  sequence  of  moves  utilized. 

The  performance  data  were  collected  for  possible  use  in  drawing 
inferences  about  several  aspects  of  spatial  problem-solving  ability. 

For  example,  the  number  of  illegal  moves,  as  well  as  the  initial  re¬ 
sponse  latencies,  might  index  the  student's  initial  ability  to  define 
and  to  clarify  the  task  situation.  The  sensitivity  of  students  to  the 
task  information  provided  (in  this  case,  the  continually  updated  left 
pattern  and  its  relationship  to  the  right  pattern)  and  their  ability  to 
plan  a  sequence  of  moves  might  be  indexed  by  the  number  of  nonoptimal 
moves,  the  number  of  repeated  moves,  and  the  total  number  of  moves  re¬ 
quired.  A  student's  inability  to  recenter  (Sweeny,  1953;  Wertheimer, 
1959)  or  the  presence  of  a  debilitating  set  might  be  inferred  from  a 
persistent  sequence  of  moves  that  did  not  bring  the  start  pattern 
closer  to  the  goal  pattern. 

The  pattern  of  response  latencies  as  the  student  approached  the 
solution  might  also  be  useful  information  in  making  inferences  about  a 
student's  problem-solving  strategy.  For  example,  in  the  initial  stages 
of  the  problem,  a  planning-ahead  strategy  might  be  inferred  from  longer 
initial  response  latencies,  and  a  more  impulsive,  reactive  strategy  or 
problem-solving  style  would  be  associated  with  shorter  latencies.  If 
the  student  was  sensitive  to  the  relationship  between  the  two  stimulus 
patterns,  a  shortening  of  the  response  latencies  might  be  expected  as 
the  left  (start)  pattern  approached  the  right  (goal)  pattern  (Hayes, 
1965). 

Individual  differences  in  the  ability  to  visualize  or  to  maintain 
sequences  of  moves  of  varying  lengths  in  short-term  memory  might  also 
be  reflected  in  the  patterns  of  response  latencies.  For  example,  an 
individual  with  a  greater  ability  to  maintain  a  sequence  of  moves  in 
short-term  memory  might  need  longer  pauses  or  study  points  only  once 
every  six  or  seven  moves,  as  .opposed  to  every  three  or  four  moves. 
Isolation  and  interpretation  of  such  differences  may  be  difficult,  how¬ 
ever,  since  momentary  differences  in  short-term  memory  capacity  may 
also  reflect  differences  in  the  allocation  of  limited  cognitive  re¬ 
sources  (Norman,  1976). 

Test.  Adgini strat ion 

Sixty-one  students  in  an  introductory  psychology  class  took  the 
problem-solving  test.  Of  these,  tests  for  five  students  had  to  be  dis¬ 
carded  because  of  computer  problems.  After  being  logged  onto  the  CRT 
by  a  test  monitor,  the  student  was  presented  a  series  cf  instructional 
screens  by  the  computer.  The  text  of  each  instruction  screen  is  in 
Appendix  I.  The  presentation  of  instruction  screens  was  student  paced, 
with  the  student  pressing  the  "SPACE  BAR"  and  "RETURN"  key  on  the  ter¬ 
minal  keyboard  to  proceed  to  the  next  instruction  screen. 
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As  Appendix  3  shows,  the  instructions  first  told  the  student  how 
to  utilize  important  keyboard  characters,  such  as  the  "RETURN”  key,  to 
enter  responses.  Next,  after  describing  the  15-puzzle  task  and  pro¬ 
viding  instructions  on  entering  a  three-character  move,  the 
instructions  told  the  student  how  to  correct  a  mistyped  move  before 
transmitting  the  move  to  the  computer.  Biographical  information, 
including  name,  student  identification  number,  age,  sex,  year  in 
school,  major  field  of  study,  race,  and  grade-point  average,  was  then 
requested  from  each  student.  The  final  instructional  screen  (see 
Screen  16  in  Appendix  B)  was  intended  to  standardize  the  desired 
motivational  set  for  each  student.  The  student  was  then  presented  with 
a  practice  problem.  This  practice  problem  (Problem  1)  was  very  simple, 
requiring  only  three  straightforward,  eoves,  and  was  used  to  allow 
students  to  clarify  questions  and  to  gain  confidence  in  entering  moves 
under  nontesting  conditions. 

Following  the  practice  problem,  students  were  presented  a  maximum 
of  12  problems  (Problems  2  to  13).  These  problems  varied  in  difficul¬ 
ty,  which  was  initially  indexed  by  the  minimum  number  cf  moves  required 
for  solution  (solution  path  length)  using  a  solution  algorithm  provided 
by  Nilsson  (1S71).  Each  of  the  12  problems  consisted  cf  one  problem 
requiring  4  and  6  moves  and  two  problems  for  each  of  the  following  so¬ 
lution  path  lengths:  S,  10,  12,  14,  and  16.  The  13  problems  used, 
along  with  their  solution  path  length  and  other  physical  problem  char¬ 
acteristics,  are  in  Appendix  C. 

Data  for  all  students  were  not  obtained  for  all  the  problems  for  a 
variety  of  reasons.  Since  the  students  differed  in  both  solution  effi¬ 
ciency  and  in  the  amount  of  time  they  had  available  to  participate  in 
the  study,  not  all  students  completed  all  the  problems.  In  addition, 
after  about  half  the  students  (the  first  33)  had  completed  the  tests, 
it  appeared  that  a  test  consisting  of  12  problems  wa*s  somewhat  too  long 
and  that  some  students  did  not  have  enough  time  in  the  experimental 
hour  to  finish  the  longer  problems.  For  this  reason,  two  cf  the  easi¬ 
est  problems  (Problems  2  and  3),  which  everyone  seemed  to  be  solving  in 
the  minimum  number  of  moves,  were  eliminated  to  make  the  test  shorter. 
Finally,  in  a  few  cases,  data  for  a  single  problem  were  lost  for  a  stu¬ 
dent  due  to  computer  problems. 

There  was  no  fixed  time  limit  for  each  problem.  However,  in  order 
to  prevent  a  student  from  spending  too  much  time  on  a  single  problem  tc 
the  exclusion  of  others,  a  message  advising  the  student  to  notify  the 
test  proctor  was  displayed  on  the  terminal  screen  after  the  student  had 
been  working  on  a  problem  for  what  was  thought  to  be  an  unduly  long 
time.  The  maximum  time  allowed  for  each  problem  was  a  multiplicative 
function  of  the  minimum  number  of  moves  required,  up  to  a  maximum  of  15 
minutes.  For  example,  about  4  minutes  were  allowed  for  a  problem  re¬ 
quiring  3  moves,  about  10  minutes  for  a  problem  requiring  3  moves,  and 
about  15  minutes  for  problems  requiring  12  to  16  moves.  The  proctor 
then  had  the  option  of  advancing  the  student  to  the  next  problem  or 
resetting  the  problem  timer  to  allow  the  student  to  continue  work  on 
that  problem.  Students  were  encouraged  to  discontinue  work  on  a  prob¬ 
lem  unless  they  felt  confident  they  were  near  solution  and  needed  only 
a  little  more  time. 


Similarly,  the  student  was  stopped  when  he  or  she  had  taken  the 
maximum  number  of  moves  allowable  for  a  problem.  The  maximum  number  of 
moves  allowed  by  the  computer  was  also  a  function  of  the  minimum  number 
of  moves  (solution  path  length)  required  to  solve  the  problem.  The 
maximum  number  of  moves  was  defined  as  the  solution  path  length  times 
3.5>  if  the  maximum  number  of  moves  was  greater  than  28,  the  maximum 
move  limit  was  set  equal  to  28.  This  maximum  was  intended  to  terminate 
work  on  a  problem  the  student  appeared  unable  to  solve  so  that  he/she 
would  proceed  to  subsequent  problems.  The  number  of  moves  it  would 
take  to  recover  from  nonoptimal  moves  was  taken  into  consideration  in 
specifying  this  initial  maximum  move  limit.  It  was  realized,  however, 
that  this  maximum  limit  might  have  to  be  adjusted  once  actual  perfor¬ 
mance  data  were  obtained. 

The  maximum  number  of  moves  allowed  was  increased  for  about  half 
the  students  to  determine  if  students  could  reach  solution  if  they  were 
given  more  moves.  Thirty-three  students  were  limited  to  28  moves  for 
the  longest  problems  and  the  remainder  were  allowed  43  moves.  The 
larger  move  limit  seemed  to  allow  more  students  to  reach  solutions  for 
some  of  the  longer  problems. 

A  student  was  permitted  to  voluntarily  choose  to  terminate  a  prob¬ 
lem  before  the  solution  was  reached  by  asking  the  test  proctor  to  ad¬ 
vance  him/  her  to  the  next  problem.  In  the  few  instances  where  this 
situation  arose,  students  were  encouraged  to  continue  work  on  a  problem 
unless  the  time  limit  message  had  already  appeared. 

When  the  student  successfully  completed  a  problem  by  matching  the 
start  and  goal  pattern:  the  computer  displayed  the  message: 

Good.  mYou  have  succeeded  in  matching  the  two  patterns.  Press  the 
SPACE  bar  and  "RETURN"  to  start  the  next  problem. 

lest  Reaction  Data 

Upon  completing  all  the  test  problems,  a  message  thanked  the  stu¬ 
dent  for  his/her  participation.  Students  then  completed  a  paper-and- 
pencil  questionnaire  providing  information  on  prior  experience,  diffi¬ 
culty  perceptions,  and  other  motivational  questions  that  could  be  used 
to  evaluate  student  reactions  to  this  type  of  test. 

Since  a  general  measure  of  spatial  reasoning  ability  was  sought, 
individual  differences  in  test  performance  should  not  be  accounted  for 
by  specific  prior  experience  with  this  type  of  puzzle.  Therefore,  the 
first  question  asked  the  student  how  often  he/she  had  worked  on  this 
kind  of  puzzle  in  the  past.  In  order  to  evaluate  the  clarity  of  the 
instructions  for  this  new  type  of  test  item,  the  second  question  asked 
students  how  much  difficulty  they  had  in  understanding  the  instruc¬ 
tions.  Because  this  was  the  first  time  this  problem  type  had  been  used 
on  this  student  population,  it  was  not  known  before  data  collection  how 
difficult  puzzle  replications  would  have  to  be  to  challenge  the  stu¬ 
dents.  Thus,  the  third  question  obtained  information  on  how  difficult 
the  students  thought  the  puzzles  in  the  test  were. 

It  was  felt  that  the  student's  motivation  level  during  testing 
would  be  especially  important  for  performance  on  problems  of  this  type. 
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which  require  sore  concentration  and  within-prohlea  perseverance  than 
sore  typical  single  item  formats.  Consequently,  Question  4  asked  stu¬ 
dents  how  hard  they  tried  to  solve  each  puzzle  in  the  optiaal  number  of 
moves,  and  Question  5  asked  whether  the  length  of  the  test  affected 
their  motivation.  Students  indicated  how  nervous  or  uncomfortable  they 
were  while  working  on  the  puzzles  in  Question  6.  Overall  evaluations 
of  how  well  they  thought  they  had  performed  and  how  well  they  enjoyed 
working  on  the  puzzles  were  provided  by  students  in  Questions  ?  and  S, 
respectively.  Any  further  comments  the  students  had  were  elicited  by 
Question  9.  Since  all  Puzzle  Reaction  questions  referred  to  different 
content,  no  scores  were  derived  across  items. 

lata  Analysis 

Indices  of  di f f i£ul tv .  Data  collected  for  each  problem 

were  usld'to  descrlbi  problem  difficulty  in  several  ways.  For  each  of 
the  13  problems  (12  problems  plus  1  practice  problem),  the  frequency 
ad  proportion  of  students  requiring  various  numbers  of  moves  to  solve 
or  to  fail  to  solve  the  problem  was  calculated.  The  following  were 
also  computed  for  each  of  the  13  problems  as  potential  indices  of  prob¬ 
lem  difficulty: 

1.  The  mean  number  of  moves  taken.  This  was  the  average  number 
of  legal  moves  used  by  the  student  to  solve  the  problem  or  the 
number  of  moves  at  which  the  problem  was  terminated  due  to 
using  too  many  moves  or  too  much  time.  Since  the  move  limit 
was  extended  from  28  to  43  for  about  one-third  of  the  stu¬ 
dents,  the  mean  number  of  moves  was  slightly  lower  for  the 
longer  puzzles  than  it  would  have  been  had  all  students  been 
allowed  the  larger  maximum  number  of  moves. 

2.  The  proportion  of  students  solving  the  problem  within  the 
original  maximum  number  of  moves  (i.e.,  for  the  longer  puz¬ 
zles  ,  28  moves . ) 

3.  The  proportion  of  students  solving  the  problem  in  the  minimum 
or  optimal  number  of  moves. 

4.  The  mean  number  of  illegal  moves. 

5.  The  mean  number  of  repeated  moves. 

In  addition,  for  each  problem  a  Special  Difficulty  Index  was  com¬ 
puted,  defined  as  the  mean  number  of  moves  used,  divided  by  the  minimum 
number  of  moves  required  (solution  path  length).  This  index  was  de¬ 
signed  to  provide  a  possible  difficulty  index  that  was  corrected  for 
differences  in  minimum  solution  path  lengths  for  each  problem.  For 
example,  a  problem  requiring  16  moves  may  not  be  more  difficult  (in  the 
sense  that  nearly  everyone  could  solve  it  in  the  minimum  number  of 
moves)  than  a  problem  requiring  only  10  moves. 

A  possible  advantage  of  the  relatively  formal  nature  of  the  15- 
puzzle  is  the  availability  of  potentially  objective  physical  problem 
characteristics,  which  could  function  as  potential  indices  of  task  dif¬ 
ficulty.  One  such  index,  solution  path  length  (i.e.,  the  minimum 
number  of  moves  required  for  solution),  has  already  been  mentioned. 
Several  other  indices  relating  the  start  pattern  to  the  goal  pattern 
were  computed  to  determine  if  they  related  empirically  to  the  actual 
difficulty  in  solving  each  problem  as  indexed  by  student  performance. 

If  such  a  relationship  was  found,  these  physical  indices  could  be  used 


In  selecting  problem  replications  for  inclusion  in  a  test  on  the  basis 
of  their  predicted  difficulty. 


The  following  physical  problem  characteristics  of  each  pair  of 
patterns  were  considered  as  potential  difficulty  indices: 

1.  Path  length:  the  minimum  number  of  moves  required  to  solve  the 
problem. 

2.  The  number  of  squares  not  matching  in  the  start  and  goal  pat- 
terns'at  the  start  of  the  problem  (maximum  »  16). 

3.  The  number  of  rows  disrupted  or  not  matching  in  the  two  pat¬ 
terns  (maximum  *  4). 

4.  The  cumber  of  columns  disrupted  or  not  matching  in  the  two 
patterns  (maximum  *  4). 

5.  Euclidean  distance  function:  the  sum  of  the  distances  of  each 
‘number's  position  in  the  start  pattern  from  its  position  in 

the  goal  pattern  using  the  Pythagorean  theorem  (i.e.,  diagonal 
distances  allowable). 

6.  City-block  distance  function:  the  sum  of  the  distances  of  each 
number's  position  in  the  start  pattern  from  its  position  in 
the  goal  pattern  with  only  vertical  and  horizontal  (not  diago¬ 
nal)  displacements  calculated. 

Appendix  C  shows  each  of  these  physical  problem  characteristics  for 
each  of  the  13  problems. 

Assessment  of  student  performance .  Eeriving  scores  for  a  student 
on  a  single- probllmT'and  on“thIs"typi“of  test  as  a  whole,  is  complicat¬ 
ed  by  several  factors.  For  example,  some  students  were  not  able  to 
work  on  the  test  as  long  as  others?  some  students  naturally  worked 
faster  than  others?  and  in  a  few  cases,  data  on  isolated  problems  was 
lost  because  of  computer  failure.  In  addition,  half  the  students  did 
not  wcrk  on  Problems  2  and  3,  since  these  were  eliminated  to  shorten 
the  test. 

As  a  result,  scoring  a  student's  peformance  merely  by  the  number 
of  problems  solved  was  not  only  undesirable  from  a  theoretical  point  of 
view  but  it  was  also  impractical  due  to  the  above  confounding  factors. 
For  this  reason,  and  also  from  the  point  of  view  of  using  these  prob¬ 
lems  in  future  adaptive  testing,  it  was  desirable  to  develop  scoring- 
methods  that  did  not  depend  on  the  particular  problem  replications  on 
which  the  student  worked.  This  suggested  usine  such  measures  as  the 
proportion  of  problems  worked  on  by  the  student  that  he  or  she  was  able 
to  solve  or  the  proportion  of  problems  attempted  that  the  student 
solved  in  the  optimal  number  of  moves.  However,  these  measures  do  not 
take  into  account  the  differential  difficulty  of  different  problems  or 
individual  differences  in  the  number  of  moves  used  between  the  optimal 
and  maximum  allowed  number.  Using  the  number  of  moves  a  student  made 
on  a  problem  would  not  take  into  account  the  differential  solution  path 
lengths  and  the  difficulty  of  problems.  Potential  measures  that  would 
take  into  account  the  difficulty  of  various  problems,  such  as  the  mean 
difficulty  of  problems  solved  or  the  highest  difficulty  problem  solved 
in  the  optimal’ number  of  moves,  would  not  be  comparable  for  students 
who  did  not  receive  problems  of  the  same  difficulty  level. 

Taking  into  consideration  all  these  problems,  two  methods  of  scor¬ 
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ing  student  performance  on  Individual  problems  were  devised: 

1.  Score  1  *  -the-number- gf  moves  the  student  used 

the  miniums  number  of  moves  actually  required 

Tor  example,  if  a  student  took  15  moves  to  solve  Problem  6, 
which  required  10  moves,  his/her  score  was  15/10  *  1.5.  Since 
a  perfect  score  would  be  1.0,  this  student  required  50%  mere 
moves  than  were  necessary.  Note  that  although  this  score  cor¬ 
rected  for  different  solution,  path  lengths  of  various  prob¬ 
lems,  it  did  not  take  into  account  the  difficulty  of  the  prob¬ 
lem  as  indexed  by  the  total  group's  performance  on  the  prob¬ 
lem. 

2.  Score  2.  This  score  was  Score  1  adjusted  by  the  Special  Dif¬ 
ficulty  Index.  Thus, 

Score  2  =  (Score  l)/(Special  Difficulty  Index) 

This  score  reduces  to 

Score  2  =  the  number  of  moves  the  student  used _ 

the  mean  number  of  moves  required  by  the  total  group 

Thus,  if  Score  2  *  1.0,  the  student's  performance  was  equal  to 
the  group  average.  If  Score  2  was  less  than  l.C,  the  student 
solved  the  problem  in  fewer  moves  than  the  average  student; 
conversely,  if  Score  2  was  greater  than  1.0,  the  student 
solved  the  problem  in  more  moves  than  the  average  student. 

To  determine  whether  these  specially  defined  scores  were  any  more 
meaningful  than  more  direct  scores,  such  as  the  proportion  of  problems 
solved,  the  relationships  between  the  following  four  scores  for  the 
test  as  a  whole  were  examined: 

1.  PHOPS  =  the  proportion  of  problems  that  the  student  attempted 

(worked  on)  and  solved  within  the  maximum  number  of 
moves  (23). 

2.  PROPM  *  the  proportion  of  problems  that  the  student  attempted 

and  solved  in  the  minimum  (optimal)  number  of  moves. 

3.  Total  1  *  the  average  Score  1  obtained  on  the  problems  the 

student  attempted. 

4.  Total  2  *  the  average  Score  2  obtained  on  the  problems  the 

student  attempted. 

It  was  hypothesized  that  the  Total  2  score  would  prove  to  be  the  most 
meaningful  score,  since  it  took  into  account  both  the  solution  path 
length  and  the  difficulty  of  the  problems  the  student  attempted  and  did 
not  depend  on  the  number  of  problems  attempted.  By  adjusting  for  prob¬ 
lem  difficulty,  a  student  was  penalized  more  by  Total  2  for  less  than 
optimal  solutions  on  easier  problems  than  on  more  difficult  problems. 

Ccr^l stenjy  of  2£lX2ISiSSS  SSISSi  2I£fel£S£«  An  important  question 
for  dit 5 raining  the  usefulness  of  this  problem  type  in  assessing  spa¬ 
tial  problem-solving  ability  was  whether  reliable  individual  differ¬ 
ences  on  various  performance  criteria  could  be  identified  across  prob- 
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1 em  replications  of  similar  and  varying  difficulty  levels.  To  examine 
this  question,  the  consistency  of  the  various  performance  scores  was 
examined  across  all  13  problems  using  Pearson  product-moment  correla¬ 
tions.  Since  both  of  the  individual  problem  scores  (Score  1,  Score  2) 
were  linear  transformations  of  the  optimal  number  of  moves,  the  consis¬ 
tency  of  these  scores  across  problems  in  terms  of  Pearson  product-mo¬ 
ment  correlations  would  be  the  same  as  the  stability  of  the  number  of 
moves  used.  Thus,  the  stability  of  the  following  performance  indices 
were  examined: 

1.  The  total  number  of  legal  moves  used  for  each  problem, 

2.  The  number  of  illegal  moves,  and 

3.  The  number  of  repeated  moves. 

The  relationship  between  individual  problem  scores  and  total 
scores  on  the  problem  set  as  a  whole  was  investigated  by  examining  the 
correlations  between  individual  problem  scores  (Score  1,  Score  2)  and 
total  test  scores  (PROPS ,  PROPM,  Total  1,  Total  2)  with  and  without  the 
particular  problem  being  excluded  from  the  total  score.  In  addition, 
the  relationships  between  the  total  number  of  legal  moves  used  (or, 
equivalently.  Score  1  and  Score  2),  the  number  of  illegal  moves,  and 
the  number  of  repeated  moves  for  each  problem  were  examined  by  comput¬ 
ing  the  Pearson  product-moment  correlations  between  pairs  of  these  per¬ 
formance  indices  across  students  for  all  pairings  of  problem  replica¬ 
tions. 

Response  latencies.  luring  testing  the  time  in  seconds  taken  by  a 
student  for  every  move  was  recorded  by  the  computer.  This  allowed  la¬ 
tency  trends  across  moves  to  be  plotted  and  studied  for  each  problem. 
Three  indices  were  used  to  quantitatively  characterize  a  student's  re¬ 
sponse  latencies  for  a  problem: 

1.  Initial  move  latency,  i.e.,  how  long  the  student  studied  the 
initial  problem  configuration  before  making  the  first  move; 

2.  The  average  move  latency,  i.e.,  the  average  time  taken  for  a 
move  across  the  particular  problem;  and 

3.  Total  problem  latency,  i.e.,  the  total  time  in  seconds  taken 
by  the  student  on  a  particular  problem. 

In  order  to  compare  the  time  taken  on  various  problems  with  the  problem 
difficulty  as  indexed  by  various  performance  measures,  the  mean  of  the 
above  three  latency  measures  was  computed  across  all  subjects  for  each 
problem. 

Although  the  tendency  for  various  performance  measures  (e.g.,  the 
number  of  moves  needed)  to  correlate  across  problems  indexes  the  relia¬ 
bility  of  problem-solving  performance,  the  tendency  for  a  student's 
response  latencies  to  show  consistency  across  problems  may  Indicate  a 
cognitive  style,  e.g.,  reflectivity  versus  impulsiveness  (Sagan,  1965; 
Kagan  et  al.,  1964)  or  a  strategy  of  planning-ahead  versus  trial  and 
error  or  impulsive  responding.  To  study  this  possibility,  the  consis¬ 
tency  of  the  Initial,  average,  and  total  response  latency  measures 
across  problems  was  examined  using  Pearson  product-moment  correlations. 
Tor  example,  by  correlating  the  initial  move  latency  across  students 
for  each  pair  of  problems,  it  could  be  determined  whether  some  students 
consistently  studied  each  problem  for  longer  or  shorter  times  than 
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other  students.  Similarly,  hy  correlating  the  total  problem  latency 
over  students  for  each  possible  problem  pair,  it  could  be  determined 
whether  the  same  students  who  took  longer  or  shorter  times  to  solve  one 
problem  also  did  so  on  the  other  problems. 

It  was  also  of  Interest  to  examine  the  response  latency  trends  as 
the  student  progressed  throughout  each  problem.  Such  trends  may  indi¬ 
cate  the  degree  of  initial  planning,  the  number  of  moves  a  student  made 
between  study  points,  and  the  point  at  which  the  sequence  of  moves  to 
solution  had  been  detected.  lor  this  purpose  latency  graphs  for  indi¬ 
vidual  students  showing  the  response  latency  for  each  move  from  start 
to  solution  were  plotted  and  inspected  visually.  Latency  plots  were 
examined  for  students  who  had  performed  well  on  the  test  and  those  who 
had  performed  poorly  and  for  problems  solved  and  problems  unsolved,  in 
order  to  detect  any  systematic  differences  in  latency  trends. 

BsiiiiSBSliE  SSUSSSa  BSlXfilSaSSfi  ap4  response  latencies.  In  order 
to  determine  if  any  relationship  existed  between  students  performance 
on  the  problem  and  the  way  they  allocated  their  time  on  each  problem, 
Pearson  product-moment  correlations  were  computed  for  each  problem  be¬ 
tween  the  initial,  average,  and  total  move  latencies  and  the  number  of 
moves  each  student  used.  In  addition,  correlations  were  also  computed 
between  total  test  score,  which  better  indexed  the  student's  perfor¬ 
mance  on  the  test  as  a  whole,  and  the  initial,  average,  and  total  la¬ 
tencies  for  each  problem.  lor  these  correlations  the  total  test  scores 
used  were  Total  2  and  a  mean  Judges'  performance  rating,  described  be¬ 
low. 

iljdggll  rating^  of  pfirf o rmancg .  Because  reliable  external  crite¬ 
ria  against  which  the  student  performance  scores  could  be  validated 
were  not  available,  each  student's  performance  on  each  problem  was 
studied  independently  by  three  Judges  and  each  student's  overall  test 
performance  was  rated  on  a  10-point  scale,  with  5  being  anchored  to 
average  or  mean  performance,  considering  the  sample  as  a  whole.  The 
mean  of  the  ratings  of  the  three  Judges  (MRATE)  was  used  as  another 
index  of  student  total  test  performance. 

Since  the  Judges  were  familiar  with  the  difficulty  of  each  problem 
and  could  carefully  examine  the  student's  performance  cn  each  problem, 
it  was  felt  that  these  ratings  would  provide  a  more  complete  assessment 
and  rank  ordering  of  student  performance.  Although  less  subjective, 
the  performance  scoring  methods  described  above  were  not  equally  able 
to  take  into  account  all  the  information  that  the  Judges  could  in  their 
ratings.  Thus,  one  way  to  compare  the  adequacy  or  refinement  of  the 
various  scoring  methods  was  to  compare  the  rank  ordering  of  students  by 
each  method  with  the  rank  ordering  assigned  by  the  Judges'  ratings. 

This  was  done  using  Spearman  rank-order  correlation  coefficients. 


To  determine  how  well  Independent  Judges  could  agree  on  the  ra¬ 
tings  of  student  performance,  interrater  reliability  as  estimated  by 
the  following  form  of  the  intraclass  correlation  was  used: 


-  -  ms  ,  -  MS 

I  students _ error 

^stud.ne.  *  (,t-1)  MS.rror 


where  the  various  mean  squares  (MS)  were  derived  from  a  standard  two 
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way  analysis  of  variance  and  the  mean  square  for  error  term  represented 
variation  due  to  the  interaction  of  students  and  Judges.  Note  that 
since  only  the  reliability  of  the  rank  ordering  of  students,  and  not 
mean  level  of  differences  of  Judges'  ratings,  was  of  interest  (i.e., 
interrater  reliability  versus  interrater  agreement),  the  error  term  did 
not  include  variation  due  to  Judges  (Tinsley  &  Weiss,  1575). 

Motivational  and  biographical  data.  The  frequency  and  percentage 
of  students” end or sing  various  response  alternatives  to  questions  in  the 
Puzzle  Reaction  Questionnaire,  completed  at  the  end  of  testing,  were 
tabulated  in  order  to  determine  students'  prior  experience  with  this 
problem  type,  the  perceived  difficulty  of  the  instructions  of  the  test, 
and  the  motivation  and  anxiety  level  of  the  students  during  the  test. 
Completed  posttest  questionnaires  were  obtained  from  50  students.  Al¬ 
though  the  responses  to  the  Puzzle  Reaction  Questionnaire  were  analyzed 
and  provided  useful  Information  on  the  motivational  characteristics  of 
the  total  group,  the  small  number  of  students  distributing  themselves 
over  various  response  categories  made  group  performance  comparisons 
between  students  in  different  response  categories  inappropriate  for 
many  of  the  questions. 

Cne  exception  was  Question  2,  which  was  especially  important  be¬ 
cause  it  involved  whether  previous  practice  with  problems  of  this  type 
would  affect  test  performance.  The  relationship  between  a  student's 
prior  experience  with  this  problem  type  and  his/her  test  performance 
was  determined  by  performing  t  tests  on  the  differences  in  mean  total 
score  (Total  2,  MRATE)  for  those  students  reporting  little  or  no  prior 
experience  with  this  problem  type  versus  students  reporting  much  exper¬ 
ience. 

Since  problems  of  the  type  used  in  this  study  may  require  higher 
levels  of  motivation  than  more  traditional  psychometric  measures,  it 
was  also  Important  to  investigate  the  effect  of  motivation  level  on 
performance  with  the  limited  data  available.  For  this  purpose  t  tests 
were  performed  on  the  performance  means  of  students  reporting  different 
levels  of  motivation  in  Question  4. 

In  addition,  since  males  as  a  group  have  generally  been  found  to 
score  higher  than  females  as  a  group  on  tests  of  spatial  abilities 
(Garai  &  Schelnfeld,  1968;  MacCoby  &  Jacklin,  1974),  it  was  of  interest 
to  determine  whether  sex  differences  existed  for  this  test.  Thus,  a  t 
test  was  used  to  compare  the  male  and  female  group  mean  total  scores.' 

Perceivgd  EiffiSSlli  Rgtl&gs 

Subjective  ratings  of  the  perceived  difficulty  of  replications  of 
the  15-puzzle  were  obtained  from  a  separate  sample  of  students  in  order 
to  investigate  the  following  questions: 

1.  What  subjective  dimensions  do  students  use  in  evaluating  the 
difficulty  of  this  problem  type? 

2.  Bow  accurately  can  students  evaluate  the  actual  difficulty  of 
these  problems?  That  is,  do  difficulty  ratings  agree  with 
actual  performance  data?  How  finely  can  discriminations  be 
made  between  problems  of  similar  difficulty  levels? 

3.  Are  there  reliable  individual  differences  in  the  perceived 
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difficulty  of  these  problems  and  la  the  ability  to  make  finer 
discriminations? 

The  latter  two  questions,  in  particular,  address  indirectly  the  ques¬ 
tion  of  whether  students'  perceptions  of  task  difficulty  can  be  related 
to  their  performance.  For  example,  to  the  extent  that  reliable  indi¬ 
vidual  differences  in  the  ability  to  visualize  a  sequence  of  moves  in 
short-term  memory  exist,  this  might  be  expected  to  result  in  reliable 
differences  in  both  perceived  task  difficulty  and  in  actual  task  per¬ 
formance. 

To  maximally  associate  perceived  difficulty  with  actual  perfor¬ 
mance,  the  same  students  would  Ideally  make  the  ratings  and  solve  the 
problems.  Cue  to  limitations  in  student  time,  this  was  not  possible  in 
the  present  study;  instead,  a  second  sample  from  the  same  population 
was  utilized.  Using  separate  samples  for  the  two  tasks  has  the  advan¬ 
tage  that  a  student's  rating  of  problem  difficulty  would  not  influence 
or  be  influenced  by  actual  performance  on  the  problem. 

Procedure 

StjbJect^ .  A  total  of  47  students  from  an  introductory  level  psy¬ 
cho  logy”  course  rated  the  difficulty  of  67  stimuli.  Each  stimulus  con¬ 
sisted  of  a  typed  start-and-goal  configuration  for  one  15-puzzle  on  an 
index  card.  To  shorten  the  length  of  the  rating  task  for  each  student, 
the  67  puzzles  were  divided  into  4  sets  of  16  or  17  puzzles  each  and 
the  47  subjects  were  randomly  assigned  to  one  of  the  4  puzzle  sets. 
Since  the  students  were  divided  into  groups  merely  to  shorten  the  task, 
analyses  were  generally  carried  out  for  the  sample  as  a  whole;  thus, 
the  results  will  not  be  discussed  separately  for  each  group. 

Bata  for  three  students  were  not  included  in  the  analysis  because 
they  failed  either  to  perform  or  to  record  their  ratings  in  accordance 
with  instructions.  Students  took  an  average  of  about  40  to  45  minutes 
to  complete  the  rating  task. 

fSISlg  iiiSSlA*  Selection  of  the  67  puzzles  used  in  this  study 
was  done”wf th”cari"’because  they  were  to  be  used  in  several  ways.  For 
example,  in  order  to  be  able  to  trace  the  perceived  difficulty  trend 
within  a  single  puzzle  (which  might  require  16  moves  from  start  to 
goal),  ratings  were  obtained  for  several  puzzles  with  the  same  goal 
configuration  but  with  start  configurations  that  converged  on  the  goal. 
As  a  result,  it  was  possible  to  detect  how  many  moves  from  the  goal  a 
student  would  have  to  be  before  the  problem  would  begin  to  look  some¬ 
what  easy,  then  easy,  and  so  on. 

Since  one  hypothesized  difficulty  dimension  was  that  of  path 
length  (or  number  of  moves  required),  puzzles  utilized  a  relatively 
uniform  continuum  of  path  lengths  from  1  to  26.  Of  the  12  problems 
used  in  the  problem-solving  performance  portion  of  the  study,  9  were 
included  among  the  stimuli  rated  in  the  rating  task.  Of  these  9,  4 
were  divided  into  subpuzzles  of  varying  lengths,  as  described  above,  in 
order  to  examine  the  perceived  difficulty  trend  within  the  individual 
problems . 

E4li.SE  B£££S4SI£*  Appendix  D  contains  a  copy  of  the  self-adminis- 
tered”Instruction  and  recording  booklet  that  each  student  received. 
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Students  were  told  how  this  type  of  prohlea  was  solved  so  that  they 
could  rate  how  difficult  they  thought  it  would  he  if  they  had  to  solve 
it.  Students  first  sorted  the  puzzles  into  six  categories  labeled  Very 
Difficult,  Difficult,  Somewhat  Difficult,  Somewhat  Easy,  Easy,  and  Very 
Easy.  It  was  made  clear  to  the  students  that  there  were  no  required 
number  of  puzzles  to  be  sorted  into  any  of  the  piles  but  that  they 
should  put  each  puzzle  into  the  category  that  had  a  label  best  describ¬ 
ing  how  difficult  they  thought  the  puzzle  would  be  to  solve.  In  each 
puzzle  set  four  of  the  puzzles  were  specially  selected  ahead  of  time  to 
range  from  Very  Easy  to  Very  Difficult,  in  terms  of  path  length.  These 
four  puzzles  had  a  special  message  on  the  index  card  instructing  the 
student  to  provide  reasons,  or  a  basis,  for  sorting  the  stimuli  into  a 
specific  category.  These  reasons,  along  with  the  posttask  questions 
(see  Appendix  D)  regarding  what  rules  or  criteria  they  used  for  sorting 
into  each  of  the  six  categories,  constituted  the  protocols  that  were 
later  analyzed  to  determine  the  dimensions  on  which  the  students 
thought  they  were  sorting. 

After  recording  the  puzzles  that  were  sorted  into  the  original  six 
categories,  students  were  asked  to  attempt  to  break  down  each  category 
into  subcategories  based  on  finer  difficulty  discriminations.  The  stu¬ 
dents  were  encouraged  to  subdivide  into  as  many  subcategories  as  they 
could  but  only  to  do  so  if  they  felt  they  could  differentiate  the  dif¬ 
ficulty  of  the  puzzles  in  the  same  category.  No  re-sorting  across  the 
original  six  categories  was  allowed.  After  recording  the  stimuli  in 
each  of  the  final  subdivided  categories,  students  responded  to  a  ques¬ 
tionnaire  that  gathered  information  about  their  prior  experience  with 
this  kind  of  puzzle,  whether  they  had  difficulty  understanding  the 
task,  and  their  motivation  level  during  the  study.  More  importantly, 
students  provided  their  own  rules  or  criteria  for  sorting  into  each  of 
the  categories,  for  example,  how  they  distinguished  a  Very  Easy  from  a 
Somewhat  Easy  puzzle. 

On  the  last  page  of  the  booklet,  and  after  the  students  had  al¬ 
ready  volunteered  their  own  rating  dimensions  or  rules,  a  list  of  nine 
dimensions  was  provided,  which  were  hypothesized  to  be  related  to  stu¬ 
dents'  ratings.  Students  were  asked  to  indicate  for  each  of  the  nine 
dimensions  whether  they  considered  it  in  all,  most,  some,  or  none  of 
the  puzzle  ratings.  These  nine  dimensions  also  Included  two  dimensions 
that  were  supposed  to  serve  as  validity  dimensions  (see  Questions  8c 
and  6f  in  Appendix  D).  It  was  felt  that  these  dimensions  (particularly 
a f )  would  be  irrelevant  to  perceived  difficulty  and  would  therefore 
serve  to  detect  students  who  were  randomly  responding  or  feeling  that 
they  should  have  used  every  dimension  suggested  by  the  experimenter. 

Analysis 

Reported  dimensions  of  difficulty .  Self-reported  dimensions  of 
percelve5_ir?f Iculty-wire_ thus ”o?_ two  types  in  this  study.  First,  stu¬ 
dents  voluntarily  provided  the  basis  for  their  difficulty  Judgments 
during  the  sorting  task.  During  this  portion  of  the  task,  students 
were  provided  no  information  as  to  the  dimensions  to  be  used  in  making 
their  Judgments.  After  sorting  the  puzzles  into  piles  representing 
different  perceived  difficulty  levels,  an  experimenter-provided  list  of 
possible  rating  dimensions  was  provided  and  students  indicated  whether 
they  used  each  dimension  on  all,  most,  some,  or  none  of  the  problems. 
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For  each  type  of  self-report  (the  voluntary  protocols  and  the  ex¬ 
perimenter-provided  dimensions),  the  proportion  of  students  reporting 
use  of  each  dimension  was  calculated  and  a  determination  was  made  of 
the  most  frequently  used  or  important  rating  dimensions.  Judgments  of 
which  dimensions  were  being  reported  during  the  sorting  task  were  made 
hy  one  graduate  and  one  undergraduate  research  assistant  and  involved 
studying  the  students'  written  responses  to  the  "Provide  your  reasons" 
section  of  the  rating  booklet  (see  Appendix  E,  Step  1)  and  Questions  5, 
6,  and  7  in  the  postrating  task  questionnaire  (see  Step  4  in  Appendix 
E).  Representative  protocols  provided  by  the  students  to  indicate  use 
of  each  reported  rating  dimension  are  contained  in  Appendix  E. 

SeiS  Icings .  Scale  values  representing  mean 
perceived  difficulty  were  oStained  From  the  final  subdivided  category 
sorting  of  the  puzzles.  The  center  point  of  each  of  the  original  six 
categories  was  assigned  the  number  5,  15,  25,  35,  45,  or  55  for  the 
respective  categories  Very  Easy,  Easy,  Somewhat  Easy,  Somewhat  Diffi¬ 
cult,  Difficult,  and  Very  Difficult,  Vhen  puzzles  within  one  of  these 
six  categories  were  subdivided  into  subcategories,  the  five  integer 
intervals  on  each  side  of  the  center  point  were  prorated  or  divided  to 
assign  differential  rating  values  to  each  puzzle.  The  mean  rating 
across  all  students  was  then  computed  to  obtain  the  subdivided  scale 
values.  These  subdivided  scale  values  were  then  divided  by  10  to  scale 
them  from  1  to  6,  thus  making  them  comparable  to  the  original  category 
labels.  Thus,  a  puzzle  felt  to  be  Very  Easy  by  the  average  student 
would  have  a  scale  value  in  the  range  of  about  .5  to  1.5,  an  Easy  puz¬ 
zle's  scale  value  would  range  from  1.5  to  2.5,  and  so  on. 

These  scale  values  were  then  used  to  determine  the  range  of  prob¬ 
lems  (e.g.,  problems  requiring  three  to  six  moves)  perceived  to  be  in 
each  of  the  categories  (e.g..  Very  Easy,  Easy)  by  plotting  the  scale 
values  versus  the  solution  path  lengths  of  the  puzzles.  Finally,  the 
relationship  between  perceived  difficulty  and  actual  performance  on  the 
set  of  puzzles  administered  to  the  first  group  of  students  was  investi¬ 
gated  by  correlating  mean  difficulty  ratings  with  the  performance  and 
response  latency  measures  obtained  for  the  nine  puzzles  that  were  in¬ 
cluded  in  both  the  performance  and  difficulty  rating  portions  of  this 
study. 

Between  Objective  and  Subjective  Difficulty  Indices 

Each  of  the  performance  measures,  response  latency  measures,  phys¬ 
ical  problem  characteristics,  and  the  perceived  difficulty  mean  ratings 
can  be  considered  potential  problem  difficulty  indices.  For  example, 
the  difficulty  of  a  problem  could  be  indexed  in  several  ways:  (1)  by 
the  proportion  of  persons  solving  it,  (2)  by  the  average  response  la¬ 
tency  used  in  working  on  the  problem,  or  (3)  by  the  number  of  squares 
needing  to  be  moved  large  distances  in  the  pattern.  The  similarity  of 
the  rank  orders  of  various  objective  indices  will  likely  vary. 

In  addition,  the  rank  orderings  of  the  problem  difficulties  by 
performance  or  physical  indices  obtained  in  the  first  part  of  the  study 
can  be  compared  with  the  rank  ordering  of  subjective  (perceived)  diffi¬ 
culty  obtained  in  the  second  part  of  the  study.  For  this  purpose,  the 
Spearman  rank-order  correlation  coefficient  was  computed  between  the 
rank  orders  of  problem  difficulty  provided  by  all  performance,  latency. 
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physical,  and  perceived  difficulty  indices.  Some  of  the  questions  ad¬ 
dressed  through  examination  of  these  correlations  were  as  follows: 

1.  Co  the  performance  criteria  used  in  this  study  (mean  number  of 
moves,  proportion  solving  the  problem,  proportion  solving  it 
in  the  minimum  number  of  moves)  similarly  index  problem  diffi¬ 
culty? 

2.  Co  problems  that  take  the  most  total  time  to  solve  or  that 
require  longer  average  move  latencies  also  involve  longer  ini¬ 
tial  study  times  or  latencies?  ' 

3.  Is  there  a  relationship  between  the  difficulty  of  a  problem  as 
indexed  by  performance  criteria  and  the  initial  move  latency, 
average  move  latency,  or  total  time  taken  in  solving  the  prob¬ 
lem? 

4.  How  well  does  the  perceived  difficulty  of  the  problems  compare 
with  the  actual  difficulty  as  indexed  by  performance  and  la¬ 
tency  data  and  various  physical  attributes  of  the  problem? 

5.  Which  physical  characteristics  of  the  problem  (e.g.,  path 
length,  number  of  squares  out  of  order)  are  most  predictive  of 
various  performance  and  latency  measures? 


RESULTS 

Computer-Administered  Problems 
Problem  Characteristics 

Ilii5§5  of  problem  di fficul ty .  Table  1  shows,  the  number  of  stu¬ 
dents  who  attempted  each  problem  (including  the  practice  problem.  Prob¬ 
lem  1),  the  optimal  or  minimal  number  of  moves  required  to  solve  each 
problem  (path  length),  and  the  frequency  and  percentage  of  students  who 
used  various  numbers  of  moves  before  solving  or  giving  up  working  on 
the  problem.  These  data  suggest  that  most  of  the  problems  were  too 
easy,  with  from  70.4%  to  S3. 2%  of  the  students  solving  9  of  the  13 
problems  in  the  optimal  number  of  moves.  Problems  10,  12,  13,  and,  to 
a  lesser  extent.  Problem  9  were  more  challenging,  with  from  14.6%  to 
45.7%  of  the  students  solving  the  problems  in  the  optimal  number  of 
moves.  The  data  in  Table  1  also  show  that  the  optimal  number  of  moves 

(was  not  a  perfect  indicator  of  difficulty  as  indexed  by  student  perfor¬ 
mance.  Problems  4  and  5,  which  could  optimally  be  solved  in  8  moves, 
were  solved  in  the  optimal  number  of  moves  less  frequently  (75.9%  and 
77.8%)  than  Problem  6  (87.0%),  for  which  the  optimal  number  of  moves 
was  10.  Similarly,  Problems  10  and  11  could  both  be  solved  optimally 
i  in  14  moves?  but  only  29.5%  of  the  students  solved  Problem  10  in  that 

number  of  moves,  whereas  79.6%  of  the  students  solved  Problem  11  in  the 
optimal  number  of  moves. 

Additional  data  on  student  performance  characteristics  of  the 
problems  are  shown  in  Table  2.  With  the  exception  of  Problems  9,  10, 
12,  and  13,  the  mean  number  of  moves  used  on  each  problem  (row  1  of 
Table  2)  were  quite  close  to  the  minimum  number  of  moves  required  for 
its  solution  (row  9).  Row  2  of  Table  2  shows  that  all  students  solved 
the  first  five  problems  in  the  allowed  maximum  number  of  moves  (for  the 
longer  problems  the  maximum  number  of  moves  allowed  was  26),  and  only 
for  Problems  12  (66.6%  solving)  and  13  (66.4%  solving)  were  there  sub- 
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stantlal  numbers  of  students  failing  to  solve  the  problems.  Row  3  re¬ 
ports  the  proportion  of  students  solving  each  problem  in  the  optimal 
number  of  moves.  With  the  exception  of  Problems  9,  10,  12,  and  13,  70% 
or  more  of  the  students  were  able  to  solve  the  rest  of  the  problems  in 
the  optimal  number  of  moves. 

Row  4  of  Table  2  contains  the  Special  Difficulty  Index,  which  ad¬ 
justs  for  the  differing  path  lengths  (minimum  number  of  moves  required) 
of  the  problems.  For  example,  for  Problem  4  this  index  equaled  1.21, 
Indicating  that  the  average  student  required  21%  more  than  the  minimum 
number  of  moves  to  solve  the  problem.  The  difficulty  of  each  problem 
as  indicated  by  this  index  agreed  quite  well  with  the  performance  data 
in  rows  1  through  3.  Again,  only  Problems  9,  10,  12,  and  13,  with  spe¬ 
cial  indexes  of  1.45,  1.44,  1.50,  and  1.51,  required  substantial  num¬ 
bers  of  moves  over  the  minimum  number  required  for  solution. 

A  comparison  of  the  performance  index  and  the  Special  Difficulty 
Index  with  the  minimum  number  of  moves  required  (row  S)  indicates  that 
although  the  difficulty  of  the  problems  tended  to  increase  with  solu¬ 
tion  path  length  (minimum  number  of  moves  required),  the  relationship 
was  not  strictly  monotonic.  For  example,  although  Problem  11  required 
at  least  14  moves  for  solution,  this  problem  was  much  easier  for  stu¬ 
dents  than  some  of  the  problems  requiring  fewer  moves.  Thus,  minimal 
solution  path  was  not  the  sole  determinant  of  a  problem's  difficulty. 

Mean  problem  latencies.  Rows  5  through  7  of  Table  2  show  the  mean 
initialT  averageT  and”total  latencies  of  students  for  each  of  the  13 
problems.  The  data  on  average  amount  of  time  spent  by  students  prior 
to  their  first  move  (mean  initial  latency)  indicates  a  strong,  though 
not  nerfect,  relationship  with  the  difficulty  of  the  problem  as  indexed 
by  the  other  performance  criteria.  This  relationship  appears  even 
stronger  for  the  total  time  in  seconds  used  by  the  average  student  (row 
7)  to  solve  problems  of  varying  difficulty.  For  example,  the  mean  ini¬ 
tial  and  total  move  latencies  were  smallest  for  two  of  the  problems 
with  the  shortest  path  lengths  (Problems  2  and  3)  and  were  longest  for 
the  four  problems  with  the  longest  path  lengths  (Problems  9  through 
13).  The  trend  for  the  remaining  seven  problems  was  less  consistent, 
except  that  students  seemed,  not  surprisingly,  to  use  more  time  to 
study  and  to  complete  the  practice  problem  than  would  be  predicted  on 
the  basis  of  its  short  path  length.  Students  usually  took  about  20  to 
60  seconds  to  make  their  first  move,  whereas  total  time  working  on  a 
single  problem  ranged  from  about  67  to  361  seconds.  Most  problems  were 
solved  in  about  2.5  minutes  (150  seconds)  or  less. 

There  appeared  to  be  no  consistent  relationship  between  path 
lengths  of  the  problems  and  the  average  latency  for  the  moves  within  a 
•  single  problem  (row  6  in  Table  2).  Students  generally  took  from  6  to 
15  seconds  to  make  a  single  move,  although  again  more  time  was  taken  on 
the  practice  problem  (Problem  1). 

Perceived  scale  values.  Row  6  of  Table  2  shows  the  mean  perceived 
difficuTty”scali”vaIues”for  the  nine  test  problems  that  were  included 
in  the  perceived  difficulty  rating  portion  of  the  study.  Given  the 
assignment  of  the  numbers  1,  2,  3,  4,  5,  and  6,  respectively,  to  the 
categories  Very  Easy,  Easy,  Somewhat  Easy,  Somewhat  Difficult,  Diffi¬ 
cult,  and  Very  Difficult,  row  8  shows  that  none  of  the  problems  was 
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considered  Difficult  or  Very  Difficult  by  the  average  student.  Four 
problems  (9,  10,  12,  and,  to  a  lesser  extent,  3)  were  considered  Some¬ 
what  Difficult ,  and  the  remaining  problems  (4,  6,  7,  11,  and  13)  were 
perceived  as  Easy  or  Somewhat  Easy.  The  difficulty  perceptions  gener¬ 
ally  indicated  agreement  with  actual  performance  indices  of  difficulty, 
but  there  were  some  marked  exceptions.  In  particular.  Problems  3  and 
11  were  perceived  as  being  more  difficult  than  was  indicated  by  the 
performance  data,  whereas  the  most  difficult  problem — Problem  13  with 
Special  Difficulty  Index  of  1.51  (row  4) — was  perceived  as  being  Some¬ 
what  Easy  by  the  average  student.  These  data  indicate  that  students' 
initial  difficulty  perceptions  of  these  problems  are  fallible,  particu¬ 
larly  for  problems  with  longer  solution  paths. 

Illegal  and  repeated  moves.  Rows  10  and  11  of  Table  2  contain  the 
mean  number  of  illegal  and  repeated  moves  made  on  each  problem.  These 
data  indicate  that  students  made  few  illegal  or  repeated  moves  (means 
less  than  1.0)  on  most  of  the  problems  and  that,  with  the  exception  of 
Problems  10,  12,  and  13,  there  seemed  to  be  little  if  any  relationship 
between  the  difficulty  or  the  minimum  number  of  moves  required  and  the 
number  of  illegal  or  repeated  moves.  For  Problems  10,  12,  and  13,  how¬ 
ever,  the  average  student  made  approximately  one  or  more  illegal  and 
one  or  more  repeated  moves.  This  is  to  be  expected  for  the  more  diffi¬ 
cult  problems,  since  the  students  worked  longer  on  them  and  thus  had  a 
greater  chance  of  making  typing  errors  and  other  illegal  moves.  It 
would  be  difficult  to  unconfound  this  tendency  with  any  tendency  to  be 
more  careless  on  the  more  difficult  problems.  The  slightly  increased 
number  of  repeated  move  configurations  for  Problems  10,  12,  and  13  may 
be  more  meaningful,  indicating  a  greater  likelihood  of  students  needing 
to  back  up  in  their  solutions  to  the  more  difficult  problems.  Because 
of  the  small  number  of  illegal  and  repeated  moves  made  by  the  average 
student  on  these  problems,  these  measures  were  not  considered  further 
as  potential  indices  of  problem  difficulty  (e.g. ,  they  do  not  appear  in 
Table  3). 

among  indices  of  problem  difficulty.  Table  3  shows 
rank-order  correlations  among  the_potentfaI  Ind ices'of  problem  difficul¬ 
ty — performance  indices,  latency  measures,  perceived  difficulty,  and 
various  physical  problem  characteristics.  Data  for  Variables  1  through 
S  are  in  Table  2 f  data  for  Variables  10  through  14  are  in  Appendix  C 
for  each  of  the  problems. 

The  correlations  in  rows  2  through  4  of  Table  3  show  that  the  dif¬ 
ficulty  indices  based  on  group  performance  data  rank  ordered  the  diffi¬ 
culty  of  the  problems  quite  similarly,  with  the  strongest  agreement 
between  the  Special  Difficulty  Index  and  the  proportion  of  students 
solving  the  problem  in  the  optimal  number  of  moves  (p=-.95)  and  between 
the  mean  number  of  moves  used  and  the  proportion  of  students  solving 
the  problem  in  the  maximum  allowed  moves  (p*-.S4).  The  utility  of  the 
Special  Difficulty  Index  over  the  other  performance  indices  of 
difficulty  is  suggested  by  its  lower  correlation  with  solution  path 
length  (p  =.77) .  For  example,  using  the  mean  number  of  roves  required 
by  the  sample  to  solve  different  problems  is  less  adequate  as  an 
indicator  of  problem  difficulty  because  it  labeled  all  puzzles  with 
long  solution  paths  as  difficult  (p  =  .93)  when,  in  fact,  not  all  long 
puzzles  were  difficult  (e.g.,  Problem  11). 
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The  intercorrelations  of  the  latency  variables  in  rows  (and  col¬ 
umns)  5  through  7  of  Table  3  indicate  that  only  the  mean  initial  and 
total  latency  measures  rank  ordered  the  problem  difficulties  similarly. 
That  is,  problems  which  took  longer  times  to  solve  were  also  studied 
longer  initially  (p=.84),  but  the  average  time  for  moves  within  a  prob¬ 
lem  was  not  significantly  related  to  either  the  initial  move  (,p  =  .25)  or 
the  total  problem  latency  (P=.08). 

The  correlations  between  the  latency  variables  (rows  5  through  7) 
and  the  performance  variables  (columns  1  to  4)  show  that  the  total  time 
spent  on  a  problem  (row  7)  by  the  average  student  was  highly  predictive 
(p's*. 84,  -.76,  -.89,  .86)  of  difficulty  as  indexed  by  performance  in¬ 
dices,  and  the  amount  of  initial  study  time  spent  by  the  average  stu¬ 
dent  (row  5)  was  also  strongly  related  to  the  four  performance  indices 
(P's*. 65,  -.63,  -.67,  .63).  That  is,  not  surprisingly,  more  difficult 
problems  were  studied  longer  initially  and  took  longer  to  solve.  The 
correlations  in  columns  5  to  7  of  row  S  also  show  a  strong  relationship 
between  mean  initial  and  total  latency  and  solution  path  length  (p's* 
.61  and  .79,  respectively),  indicating  that  the  problems  with  longer 
solution  paths  were  studied  longer  initially  and  worked  on  longer. 

The  correlations  in  columns  1  to  4  of  row  8  show  that  students' 
perceptions  of  problem  difficulties  agreed  somewhat,  but  not  as  much  as 
might  be  expected,  with  the  actual  performance  measures  (p's*. 64,  -.63, 
-.43,  .40).  Although  all  these  correlations  were  in  the  appropriate 
direction,  only  the  first  two  approached  statistical  significance  due 
to  the  small  number  of  problems  (nine)  for  which  both  performance  and 
perceived  difficulty  Indices  were  available.  The  perceived  difficulty 
scale  values  in  row  S  of  Table  2  suggest  that  this  lower-than-expected 
relationship  was  due  to  the  students'  inability  to  differentiate  the 
relative  difficulties  of  problems  with  longer  solution  paths  -(such  as 
those  used  in  Problems  9  through  13).  The  correlation  between  per¬ 
ceived  difficulty  and  solution  path  length  (P*.63)  was  not  as  high  for 
the  problems  solved  on  the  computer  as  for  the  larger  stimulus  set  used 
in  the  rating  study  (r=.88),  probably  because  the  range  of  path  lengths 
used  in  the  computer  test  was  more  restricted. 

The  only  significant  correlation  between  perceived  difficulty  and 
latencies  (columns  5  to  7)  was  with  the  mean  initial  latency  measure 
(p*.75).  In  fact,  this  represented  the  highest  correlation  in  the 
matrix  for  both  variables.  This  relationship  suggests  that  the 
problems  that  were  studied  longest  before  a  move  was  made  were  the  ones 
perceived  as  being  most  difficult  (even  more  than  whether  or  not  these 
problems  actually  were  the  most  difficult). 

Examination  of  the  correlations  in  column  6  shows  that  perceived 
difficulty  of  the  problems  in  the  test  was  significantly  related  to 
only  two  physical  problem  characteristics — solution  path  length  (p=.63) 
and  number  of  rows  not  matching  in  the  two  patterns  (p=.70).  Correla¬ 
tions  with  some  of  the  other  physical  problem  characteristics,  e.g., 
the  number  of  squares  not  matching  and  the  Euclidean  and  City-flock 
distance  functions,  were  probably  restricted  by  the  reduced  range  of 
values  in  the  computerized  test  as  opposed  to  the  rating  study  (see 
section  below  on  dimensions  of  perceived  difficulty). 

Examination  of  rows  9  to  14,  columns  1  to  4,  shows  that  only  the 


24 


solution  path  length  (row  9),  and  to  a  lesser  extent  the  Euclidean  and 
City-Block  distances  (rows  13  and  14),  were  useful  in  predicting  diffi¬ 
culty  as  indexed  by  the  four  performance  measures  of  difficulty.  Solu¬ 
tion  path  length  rank  ordered  problem  difficulty  quite  similarly  to  the 
four  performance  measures  (p's*. 98,  -.91,  -.80,  and  .77),  being  most 
independent  of  the  Special  Difficulty  Index  (p*.77).  The  two  distance 
functions  moderately  predicted  mean  number  of  moires  (p's*-. 36  and  -.43, 
neither  significant)  and  the  Special  Difficulty  Index  (p's*. 31  and  .35, 
neither  significant). 

Solution  path  length  (row  9)  was  the  only  physical  problem  charac¬ 
teristic  to  predict  mean  initial  (P*.61)  or  mean  total  (P*.79)  problem 
latency.  Interestingly,  while  average  move  latency  (column  6)  was  not 
related  to  any  of  the  performance  criteria,  it  was  Inversely  related  to 
three  physical  problem  characteristics — the  number  of  squares  not 
matching  (P=-.67),  the  Euclidean  distance  function  (P*-.68),  and  the 
City-Block  distance  function  (p*-.71).  These  negative  correlations 
suggest  the  possibility  that  students  worked  faster  and  made  moves  more 
quickly  when  they  could  see  that  many  numbers  would  need  to  be  moved, 
especially  if  these  numbers  had  to  be  moved  long  distances  in  the  puz¬ 
zle. 


The  intercorrelations  of  the  physical  problem  characteristics  in 
rows  and  columns  9  to  14  show  that  the  more  highly  related  problem 
characteristics  were  solution  path  length,  the  number  of  squares  not 
matching,  and  the  two  distance  functions.  For  this  set  of  problems, 
the  Euclidean  and  City-Block  distances  were  virtually  identical  (P*.98). 
Although  the  number  of  rows  not  matching  did  not  relate  to  other  physi¬ 
cal  problem  characteristics,  the  number  of  columns  not  matching  did 
correlate  with  the  number  of  squares  not  matching  (P*.81)  and  the  Eu¬ 
clidean  distance  measure  (P=.60).  Whether  the  number  of  rows  or  col¬ 
umns  not  matching  was  more  or  equally  related  to  other  physical  indi¬ 
ces,  however,  is  strictly  dependent  on  the  particular  set  of  problem 
replications  used. 

Assessment  of  Individual  Student  Performance 

S£oriQg  methods.  For  each  individual  problem  two  scores  were  com- 
puted—Score  l7”deflned  as  the  number  of  moves  the  student  required 
divided  by  the  minimum  number  required,  and  Score  2,  defined  as  Score  1 
divided  by  (corrected  for)  the  Special  Difficulty  Index.  Four  total 
scores  were  also  derived— To tal  1  and  Total  2  were  the  averages  over 
the  problems  attempted  of  Score  1  and  Score  2,  respectively,  and  PROPS 
and  PROPM  were  the  proportion  of  problems  attempted  that  were  solved 
within  the  maximum  allowed  moves  (PROPS)  and  in  the  minimum  number  of 
moves  (PROPM).  Table  4  shows  the  means,  standard  deviations,  and  range 
of  all  these  scores  for  the  present  sample. 

Note  that  although  not  all  students  worked  on  each  individual 
problem,  thus  not  having  a  score  (Score  1,  Score  2)  for  each  problem, 
the  four  total  scores  were  obtainable  for  all  students  (N  *  55)  as  a 
result  of  the  way  these  scores  were  defined.  PROPS  and  PROPM  can  be 
considered  additive  scores,  which  essentially  total  the  number  of  prob¬ 
lems  solved  or  solved  optimally;  whereas  Total  1  and  especially  Total  2 
take  into  account  the  pattern  of  scores  across  the  problems  attempted. 
The  latter  two  scores  would  appear  to  be  particularly  appropriate  for 
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Table  4 

Mean,  Standard  Deviation,  and  Range  of  Four  Total 
Scores  and  Thirteen  Individual  Problem  Scores 


Score 

Problem 

N 

Mean 

Standard 

Deviation 

Best 

Score 

Obtained* 

Poorest 

Score 

Obtained' 

PROPS 

— 

55 

.83 

.12 

1.00 

.50 

PROPM 

— 

55 

.66 

.16 

1.00 

.36 

Total  1 

— 

55 

1.25 

.18 

1.00 

1.70 

Total  2 

— 

55 

1.00 

.14 

.84 

1.38 

Score  1 

1 

55 

1.01 

.09 

1.00 

1.67 

2 

33 

1.13 

.52 

1.00 

3.25 

3 

33 

1.01 

.06 

1.00 

1.33 

4 

54 

1.21 

.56 

1.00 

3.25 

5 

54 

1.23 

.55 

1.00 

3.25 

6 

54 

1.12 

.41 

1.00 

2.80 

7 

54 

1.28 

.49 

1.00 

2.80 

8 

50 

1.24 

.49 

1.00 

2.33 

9 

46 

1.45 

.54 

1.00 

2.33 

10 

44 

1.44 

.40 

1.00 

2.00 

11 

49 

1.11 

.27 

1.00 

2.00 

12 

48 

1.50 

.29 

1.00 

1.75 

13 

49 

1.51 

.31 

1.00 

1.75 

Score  2 

1 

55 

1.00 

.09 

.99 

1.65 

2 

33 

1.00 

.08 

.89 

2.88 

3 

33 

1.00 

.06 

.99 

1.32 

4 

54 

1.00 

.46 

.83 

2.69 

5 

54 

1.00 

.44 

.81 

2.64 

6 

54 

1.00 

.37 

.89 

2.50 

7 

54 

1.00 

.38 

.78 

2.19 

8 

50 

1.00 

.40 

.81 

1.88 

9 

46 

1.00 

.37 

.69 

1.61 

10 

44 

1.00 

.28 

.69 

1.39 

11 

49 

1.00 

.25 

.90 

1.80 

12 

48 

1.00 

.19 

.67 

1.17 

13 

49 

1.00 

.20 

.66 

1.16 

^ote  that  higher  numbers  represent  better  scores  for  the  PROPS 
and  PROPM  scores  and  lover  numbers  reflect  better  scores  for  the 
Total  1  and  Total  2  scores. 


adaptive  testing  where  not  all  students  work  on  the  same  problems. 

From  the  mean  PROPS  score  it  can  be  seen  that  the  average  student 
solved  63%  of  the  problems  attempted  in  the  maximum  allowable  moves. 

At  least  one  student  solved  all  the  problems  attempted  (best  score  * 
1.00),  and  the  student  with  the  poorest  score  (.50)  solved  only  half  of 
the  problems  attempted.  The  PROPM  data  indicate  that  the  average  stu¬ 
dent  solved  66%  of  the  problems  attempted  in  the  optimal  number  of 
moves,  with  proportions  ranging  from  100%  to  36%  solved  optimally.  The 
Total  1  mean  score  shows  that  the  average  student  required  25%  (mean  * 
1.25)  more  moves  than  optimally  required  to  solve  the  average  problem. 
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At  least  one  student  averaged  70%  sore  moves  than  required  (poorest 
score  *  1.70),  and  one  solved  all  problems  attempted  in  the  minimum 
number  of  moves  (best  score  *  1.00). 

The  Total  1  score  represents  the  proportion  of  moves  beyond  the 
minimum  number  possible,  and  the  Total  2  score  represents  the  propor¬ 
tion  of  moves  greater  or  less  than  the  mean  number  required  by  the 
group  as  a  whole.  This  is  also  true  for  the  difference  between  the  two 
individual  problem  scores.  Score  1  and  Score  2.  Thus,  by  definition, 
the  mean  Total  2  score  and  mean  Score  2  equal  1.00.  The  best  Total  2 
score  was  .84,  indicating  an  average  problem  solution  of  16%  fewer 
moves  than  the  group  norm?  whereas  the  poorest  Total  2  score  was  1.38, 
indicating  that  one  student  required  38%  more  moves  on  the  average  than 
did  the  average  student  in  the  group. 

By  definition,  the  mean  Score  1  for  each  problem  will  be  equal  to 
the  Special  Difficulty  Index  (i.e.,  mean  number  of  moves  required  by 
the  sample  divided  by  optimal  number  of  moves).  However,  the  data  in 


Table  5 

Independent  Judges’  Ratings  and  Mean  Rating  (MRATE)  of 
_ Total  Test  Performance  for  Each  Student _ 


Student 

— 

Judge 

MRATE 

Student 

Judge 

MRATE 

1“ 

2 

3 

1 

2 

3 

1 

6 

6 

7 

6.3 

30 

6 

4 

4 

4.7 

2 

7 

5 

6 

6.0 

31 

6 

5 

7 

6.0 

3 

6 

6 

7 

6.3 

32 

3 

3 

3 

3.0 

4 

4 

4 

5 

4.3 

33 

2 

3 

3 

2.7 

5 

5 

5 

5 

5.0 

34 

7 

8 

6 

7.0 

6 

7 

7 

8 

7.3 

35 

8 

6 

7 

7.0 

7 

6 

6 

8 

6.7 

36 

5 

4 

7 

5.3 

8 

4 

3 

6 

4.3 

37 

.  2 

3 

6 

3.7 

9 

7 

5 

7 

6.3 

38 

7 

6 

2 

5.0 

10 

8 

6 

8 

7.7 

39 

6 

4 

5 

5.0 

11 

4 

3 

6 

4.3 

40 

5 

5 

5 

5.0 

12 

8 

8 

8 

8.0 

41 

5 

5 

6 

5.3 

13 

5 

5 

6 

5.3 

42 

4 

5 

6 

5.0 

14 

4 

4 

5 

4.3 

43 

9 

8 

9 

8.7 

15 

2 

2 

3 

2.0 

44 

4 

3 

2 

3.0 

16 

3 

3 

4 

3.3 

45 

5 

4 

6 

5.0 

17 

8 

8 

8 

8.0 

46 

7 

7 

8 

7.3 

18 

7 

5 

6 

6.0 

47 

5 

5 

5 

5.0 

19 

6 

5 

5 

5.3 

48 

5 

5 

6 

5.3 

20 

5 

5 

5 

5.0 

49 

3 

3 

3 

3.0 

21 

4 

5 

5 

4.7 

50 

8 

8 

8 

8.0 

22 

3 

3 

3 

3.0 

51 

7 

6 

7 

6.7 

23 

8 

6 

7 

7.0 

52 

2 

2 

1 

1.7 

24 

6 

4 

5 

5.0 

53 

7 

5 

8 

6.7 

25 

6 

5 

5 

5.3 

54 

2 

3 

2 

2.3 

26 

3 

3 

3 

3.0 

55 

1 

2 

2 

1.7 

27 

8 

8 

7 

7.7 

Mean 

5. 

3  4.8 

5.4 

5.2 

28 

3 

3 

3 

3.0 

SD 

2. 

0  1.7 

2.0 

1.8 

29 

6 

5 

5 

5.3 
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Table  4  show  differing  levels  of  difficulty  for  the  13  prohleas  as  in¬ 
dexed  by  Score  1.  For  example.  Problem  9  (mean  Score  1  •  1.45)  was 
■ore  difficult  for  the  sample  Problem  4  (mean  Score  1  *  1.21), 

since  Problem  9  required  an  average  of  45%  more  moves  than  the  optimal 
number  versus  21%  more  move  than  the  optimal  number  for  Problem  4. 

Score  2,  like  Total  2,  indexes  performance  relative  to  the  mean 
student.  As  a  result,  the  mean  Score  2  across  all  students  is  1.00  for 
each  problem  by  definition.  Values  of  Score  2  below  1.00  indicate 
fewer  moves  than  the  average  student,  and  scores  greater  than  1.00 
reflect  sore  moves  than  the  average  student.  Examination  of  the  best 
and  poorest  values  of  Score  2  indicate  considerable  variability  in 
student  performance  on  the  problems.  The  best  student  on  Problem  13 
completed  the  problem  in  two-thirds  of  the  average  number  of  coves 
required  by  the  average  student,  and  the  poorest  student  on  Problem  2 
required  2.86  times  the  average  number  of  moves. 

Jjjdgji'  rjUfiga  o£  performance.  Table  5  contains  the  ratings  on  a 
10-point  scale  or  each  student  s  overall  test  performance  by  three  inde¬ 
pendent  Judges  and  the  resulting  mean  rating  (MRATE)  used  as  a  criteri¬ 
on  in  this  study  against  which  to  compare  the  alternative  scoring  meth¬ 
ods.  The  mean  and  standard  deviation  of  each  Judge's  ratings  and  the 
overall  mean  ratings  are  also  shown.  The  means  or  each  column  were  all 
close  to  5.0,  which  is  appropriate,  since  the  judges  were  instructed  to 
assign  a  rating  of  5.0  to  students  with  average  performance.  The  simi¬ 
lar  standard  deviations  indicate  a  comparable  spread  of  Judgments  by 
each  judge.  For  only  €  of  55  students  did  any  two  judges  differ  by 
more  than  2  in  their  assigned  ratings;  of  these  6  students  4  were  in¬ 
consistent  in  that  they  performed  either  very  well  on  most  problems  and 
very  poorly  on  a  few  (Students  8  and  11)  or  well  on  some  difficult 
problems  but  less  well  on  easier  ones  (Students  37  and  53).  One  of  the 
students  (Student  36)  did  not  have  data  for  three  problems  on  an  impor¬ 
tant  part  of  the  test,  making  it  difficult  to  evaluate  that  student's 
overall  performance  on  the  test. 

Table  6  shows  the  results  of  the  interrater  reliability  analysis. 

As  Table  6  shows,  most  of  the  variance  in  ratings  was  due  to  individual 
differences  in  student  performance,  and  substantial  lnterrater  reli¬ 
ability  (Pi*  .80)  was  obtained. 


Table  6 


Sources  of  Variance 

in  Performance 

Ratings 

Sources  ot  variance 

ar 

iy 

7S 

Students  ( a ) 

54 

502.5 

9.3 

Judges  (j) 

2 

10.6 

5.3 

Error  (s  x  j) 

108 

75.4 

.7 

SSliilSI-ihiB  iSiKSSfl  judges'  ratings  and  scori&g  methods.  Table  7 
shovs~the~Spearman~ rank-order  coefficients  between  each  of  the  individ¬ 
ual  total  performance  scores  (PROPS,  PROP!*,  Total  1,  and  Total  2)  and 
MEATS.  In  terms  of  its  relationship  with  the  other  scoring  methods  and 
MRATE,  PROPS  was  clearly  the  least  adequate  total  score.  This  is  not 
surprising,  since  this  scoring  method  does  not  use  important  informa¬ 
tion  on  the  differential  number  of  moves  that  are  less  than  the  maximum 
allowed.  The  highest  relationship  between  scores  was  between  Total  1 


«*>* 
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and  Total  2  (r  *  .36) »  these  scores  undoubtedly  are  so  similar  in  this 
study  because~the  test  was  not  adaptive.  Most  students  attempted  the 
same  problems,  so  that  the  Total  2  adjustment  for  the  difficulty  level 
of  the  problems  attempted  did  not  differentiate  between  students.  In 
an  adaptive  test  where  students  converged  on  problems  of  varying  diffi¬ 
culty  levels,  performance  as  indexed  by  Total  1  and  Total  2  would  be 
expected  to  differ  appreciably. 


Table  7 

Spearman  Rank-Order  Correlations  Between  Individual 

TotaJ^erformanc^^corei^n^Meai^erformanc^^ating^ 


Score 

PROPS 

PROPM 

Total  1  Total  2 

PROPS 

PROPM 

.71 

Total  1 

-.79 

-.88 

Total  2 

-.74 

-.81 

.96 

MRATE 

.68 

.87 

-.85  -.89 

Note.  All  Spearman  coefficients  significant  at  p  <  .001. 

Although  the  correlation  of  these  two  scores  was  high,  examination 
of  the  students  who  were  classified  as  the  best  performers  by  each 
score  showed  that  they  did  evaluate  performance  differently.  The  top 
10  students  on  each  score  were  essentially  the  same  group,  with  the 
exception  of  three  students  who  had  the  top  three  Total  1  scores  but 
ranked  14  through  16  on  Total  2  scores.  All  three  of  these  students 
worked  only  on  the  easier  problems  and  solved  them  all  in  close  to  the 
optimal  number  of  moves?  as  a  result,  their  Total  l  scores  were  high. 
However,  many  students  who  did  well  on  the  mere  difficult  problems  re¬ 
ceived  higher  Total  2  scores  as  well,  because  such  scores  take  into 
account  the  difficulty  level  of  problems  attempted. 

If  the  judges'  ratings,  which  examined  each  protocol  in  a  more 
comprehensive  way,  were  used  as  a  criterion  against  which  to  evaluate 
the  different  scoring  methods.  Total  2  was  slightly  but  not  signifi¬ 
cantly  better  than  the  PROPM  and  Total  1  scores.  The  judges,  in  de¬ 
scribing  how  they  made  their  ratings,  were  clearly  taking  into  account 
not  only  the  number  of  moves  beyond  the  optimal  number  (Total  1)  but 
also  the  relative  difficulty  of  the  problems  attempted  by  each  student; 
therefore,  if  students  had  worked  on  problems  of  more  varied  difficulty 
levels.  Total  2,  which  takes  both  these  factors  into  account,  would 
seem  to  be  even  more  superior  to  PROPM  and  Total  1. 

Consistency  of  ESlISISSSSS  §££2&£  ETfiilSSA*  Important  for  the 
usefuIniiI-o?-this”probIem  type  In  assessing  spatial  problem-solving 
ability  is  whether  reliable  individual  differences  on  various  perfor¬ 
mance  criteria  can  be  identified  across  problem  replications  of  similar 
and  varying  difficulty  levels.  Table  8  shows  the  Intercorrelations  of 
the  total  number  of  moves  used  by  students  (lower  triangle)  and  the 
intercorrelations  of  the  number  of  illegal  moves  made  (upper  triangle) 
across  the  13  problem  replications.  The  correlations  in  the  l'ower  half 
of  Table  8  fail  to  demonstrate  strong  consistency  of  the  Number  of 
Moves  performance  measure  across  problems.  That  is,  there  was  not  a 
consistent  tendency  for  students  to  rank  order  themselves  similarly 
across  problems  on  this  performance  score.  Some  small  clusters  of  sta¬ 
tistically  significant  and  moderate  size  correlations  existed  between 
Problems  2  through  4,  Problems  5  through  10,  and  to  a  lesser  extent 
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between  Problems  5,  10,  12,  and  13.  These  noderate  positive  correla¬ 
tions,  which  tend  to  he  located  near  the  diagonal,  suggest  that  al¬ 
though  individual  differences  as  indexed  by  total  number  of  moves  were 
not  very  consistent  for  the  particular  set  of  problems  used  here,  con¬ 
sistency  of  performance  was  more  likely  to  be  obtained  across  problems 
of  more  similar  difficulty  levels. 

A  probable  reason  for  the  lack  of  consistent  performance  across 
problems  is  the  small  variation  in  performance  for  most  of  the  problems 
due  to  the  overall  easiness  of  the  test.  Vith  the  majority  of  students 
solving  many  problems  in  the  minimal  or  close  to  minimal  number  of 
moves,  the  low  variability  of  the  performance  scores  across  problems 
would  greatly  decrease  correlations. 

Similarly,  there  was  not  a  strong  tendency  for  the  same  students 
to  make  more  illegal  moves  across  problems,  as  indicated  in  the  upper 
half  of  Table  8.  However,  many  more  moderate  and  statistically  signif¬ 
icant  correlations  existed  than  would  be  expected  by  chance.  It  was 
originally  expected  that  the  number  of  illegal  moves  might  relate  to 
difficulty  in  understanding  the  instructions  and  problem  task.  The 
small  number  of  illegal  moves  made  by  students  on  most  problems  (see 
Table  2),  however,  not  only  decreased  the  likelihood  of  large  correla¬ 
tions  across  problems  but  also  suggested  that  the  moderate  correlations 
that  did  appear  were  due  more  to  carelessness  on  the  part  of  some  stu¬ 
dents  in  entering  their  responses  on  the  CRT. 

From  Table  2  it  was  also  seen  that  there  were  very  few  repeated 
moves  made  by  students,  indicative  of  backing  up  in  the  problem  solu¬ 
tion.  Not  surprisingly,  then,  no  strong  consistency  across  problems 
was  found  for  this  performance  index  (see  correlation  matrix  in  Appen¬ 
dix  Table  F-l ) . 

To  examine  the  relationship  between  the  number  of  legal  moves 
used,  the  number  of  illegal  moves,  and  the  number  of  repeated  moves 
within  a  single  problem  and  across  problems,  the  intercorrelation  ma¬ 
trices  between  these  performance  indices  were  computed  (see  Appendix 
Tables  F-2,  F-3,  and  F-4).  If  all  three  indices  were  related  to  abili¬ 
ty  to  solve  these  problems,  they  should  be  related  to  each  other  within 
and  across  problems.  Examination  of  the  intercorrelation  matrices  dem¬ 
onstrated  that  the  number  of  total,  illegal,  or  repeated  moves  on  the 
same  or  on  a  different  problem  were  not  highly  correlated,  with  the 
exception  that  within  the  same  problem  the  number  of  repeated  moves 
correlated  moderately  highly  (average  r  =  .45)  with  the  number  of  total 
moves  (sde  Appendix  Table  F-3).  This  latter  relationship  is  not  sur¬ 
prising,  since  it  is  a  part-whole  correlation,  with  the  number  of  re¬ 
peated  moves  being  included  in  the  total  number  of  moves. 

Another  way  to  examine  consistency  of  performance  is  to  relate 
performance  on  individual  problems  with  performance  on  the  test  as  a 
whole,  as  indexed  by  various  total  scores.  These  "item-total"  correla¬ 
tions,  shown  in  Table  S,  can  assist  in  selecting  the  problems  that  are 
most  discriminating.  In  Table  9  the  five  or  six  highest  correlations 
in  each  row  are  underlined.  These  data  indicate  that  generally  prob¬ 
lems  in  the  middle  range  of  difficulty  (Problems  4  to  10)  were  most 
discriminating.  Since  correlations  between  individual  problem  scores 
and  the  four  alternative  total  scores  are  to  varying  degrees  part-whole 
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correlations,  the  last  two  rows  of  Table  9  show  the  correlations  be¬ 
tween  a  problem  score  and  the  total  score  on  the  remaining  problems 
using  the  two  total  scores  discussed  earlier  as  being  the  most  promis¬ 
ing  (Total  l.  Total  2).  Considering  that  the  problem-excluded  total 
scores  consist  of  only  12  "items"  and  that  the  easiest  and  most  diffi¬ 
cult  problems  were  not  very  discriminating,  some  of  the  correlations 
are  encouraging.  The.  data  suggest  that  if  several  problems  can  be 
tailored  to  the  same  difficulty  level  (see  discussion  of  Table  3  a- 
bove),  one  appropriate  for  each  individual  student,  improved  reliabili¬ 
ty  may  be  obtained. 


Table  9 

Product -Moment  Correlations  Between  Individual  Problem  Scores 
(Score  1,  Score  2*)  and  Several  Total  Teet  Scores,  by  Problem 

Proble 


Total  Score 

1 

2 

3 

4 

5 

6 

■MU 

8 

9 

11 

12 

13 

PROPS 

-.07 

-.21 

-.20 

-.41 

-.36 

-.47 

-.24 

-.31 

-.47 

-.38 

-.25 

-.36 

-.42 

PROPM 

-.11 

-.32 

-.18 

-.44 

-.35 

-.36 

-.58 

-.29 

-.46 

-.40 

-.21 

-.37 

-.38 

Total  1 

.06 

.35 

.26 

.53 

.55 

.49 

.60 

.41 

.49 

.43 

.26 

.33 

.27 

Total  2 
Problem 

.06 

.36 

.27 

.56 

.61 

.49 

.61 

.43 

.44 

.43 

.32 

.28 

.22 

Excluded 

Total  1 

.04 

.14 

.24 

.30 

.30 

.31 

.39 

.18 

.26 

.27 

.15 

.18 

.10 

Total  2 

.02 

.11 

.23 

.32 

.39 

.29 

.42 

.21 

.24 

.28 

.19 

.16 

.10 

Hotel  If  | r|  >  .36,  p  <  .01;  if  |r|  >  .27,  p  <  .05. 

“since  Score  1  and  Score  2  were  linear  transformations  of  each  other,  correlations  with 
total  scores  were  identical. 


Response  Latencies 

Consistency  of  latency  measures  across  problems.  Table  10  shows 
the  in tercor relations -of-lni tial  risponse~Iatincies"’( lower  triangle) 
and  average  response  latencies  (upper  triangle)  across  all  13  problems. 
The  initial  latency  correlations  showed  a  moderate  to  strong  tendency 
for  individuals  to  be  consistent  in  the  amount  of  time  they  spent  in 
initial  study  of  a  problem  prior  to  their  first  move.  There  was  an 
even  stronger  tendency  for  the  average  time  per  move  to  be  consistent 
acros-s  problems,  with  most  of  the  correlations  in  the  .30  to  .50  range 
and  many  in  the  .70  to  .80  range. 

Table  11  shows  the  intercorrelations  of  the  total  time  spent  on 
each  problem.  These  data  indicate  a  moderate  relationship  across  most 
problems. 

Thus,  there  seemed  to  be  a  substantial  degree  of  consistency  in 
the  initial,  average,  and  total  time  taken  by  individuals  in  working  on 
these  problems.  The  response  latency  measures  may  tap  differences  in 
the  cognitive  style  of  reflectivity  versus  impulsiveness  (lagan,  1S65> 
Kagan  et  al.,  1S64)  or  the  degree  of  planning  by  the  student.  Since 
all  three  correlation  matrices  (initial,  average,  and  total  latencies) 
showed  a  slight  tendency  for  the  correlations  to  be  largest  near  the 
diagonal,  the  work  strategy  or  style  of  each  student  may  vary  somewhat 
at  different  points  in  the  test,  being  more  consistent  for  problems 
that  are  worked  on  closer  to  each  other  in  time. 

The  response  latency  measures  may  also  reflect  individual  differ- 


d 


-  32  - 


a 

<d 


^  00 

0)  s 

pH  <u 
GO  pH 

a  -O 
a  o 

*rt  Vi 
Vi  P* 
H 

co 
M 
4| 

*  Vi 

‘  o 


,3.2 


(0  4> 
«  r-4 


u  c 
e  <8 

V  <H 
•u  V. 
CO  H 
►J 

O  Vi 
H  41  91 

«  a 
<d  s  a 

*-(  O  => 
ja  a'- 
a  a 
t*  <u  a 

OS  ID 


r-i  O 

«  e 

-rt  41 
u 


2 


CD 

<m  tn 
O  C 
O 

gg  a 

e  a 
o  <u 

•H  00 


a  «i 


4)  a) 
M  Vi 
M  4) 
O  > 
u  < 

M 

<D 


CO 

ON 

o 

on 

o 

on 

i-4 

on 

CO 

on 

NO 

on 

fM. 

ON 

r*. 

<55 

ON 

NO 

OO 

Mi 

n* 

to 

ON 

CN 

CN 

*0 

Ml 

CN 

cm 

tn 

CM 

4 

Mi 

4 

Ml 

try 

Mi 

4 

Mi 

try 

Mi 

**4 

Mi 

m 

Tfi 

NO 

Mi 

CO 

Mi 

NO 

/ 

• 

• 

• 

• 

• 

• 

• 

• 

/ 

/ 

/ 

CN 

CO 

CN 

00 

m 

oo 

CO 

CN 

CM 

CN 

CN 

CS. 

*4* 

00 

CN 

CS. 

tn 

Mt 

n* 

CM 

n* 

CN 

ON 

/ 

/ 

/  to 

00 

fO 

Ml 

CO 

cm 

4 

CM 

CO 

Mi 

CN 

TJ1 

CO 

Mi 

CN 

Tfi 

co 

Mi 

4 

Ml 

NO 

Mi 

NO 

Mi 

CO 

r 

Mi 

m 

• 

• 

• 

• 

/ 

/ 

✓ 

pH 

On 

p"» 

00 

pH 

<55 

o 

to 

m 

co 

try 

co 

ON 

on 

*4 

00 

o 

NQ 

o 

to 

CN 

/ 

/ 

to 

ON  <55 

tn 

tO 

M» 

H 

00 

CO 

CM 

CO 

Mi 

CO 

Ml 

co 

M» 

CN 

Mi 

4 

Mi 

tn 

Mf 

NO 

Mi 

tn 

/ 

/ 

Mi 

H 

Mi 

co 

• 

• 

• 

• 

/ 

/ 

✓ 

o 

Tli 

O 

co 

NO 

00 

r- 

to 

ao 

to 

ON 

to 

CO 

Mi 

00 

Mi 

tn 

Mi 

H 

/ 

/ 

/  CO  ON 

Pi 

co 

Tfi 

CO 

pH 

Tfi 

CN 

Cvj 

CN 

CM 

CO 

TJ1 

CN 

T|1 

try 

Mi 

m 

Mi 

*4 

Mi 

•4 

Mi 

r- 

r 

i 

Mi 

tn 

Mi 

CM 

Mi 

4 

• 

• 

• 

/ 

/ 

/ 

• 

• 

to 

O 

co 

try 

oo 

CN 

NQ 

o 

NO 

pH 

NO 

co 

to 

00 

to 

00 

/ 

/ 

/  Mi 

CN 

NO 

CO 

to 

CO 

NQ 

CO 

ON 

T?1 

CN 

CM 

CN 

CM 

CO 

T)i 

CO 

TH 

4 

Mi 

4 

Mi 

4 

Mi 

^4 

r 

Mi 

"4 

Tfi 

NO 

Mi 

CN 

Mi 

NO 

• 

• 

• 

/ 

/ 

/ 

• 

ca 

ON 

O 

ao 

to 

pH 

ON 

ON 

NO 

on 

ON 

<£> 

00 

/ 

/ 

to 

CN 

Tfi 

NO 

<55 

ON 

to 

<55 

CO 

ao 

NO 

H 

to 

to 

pH 

Ml 

4 

TH 

<r 

Mi 

4 

NO 

try 

Mi 

tn 

Mi 

pH 

Tfi 

4 

Mi 

CO 

Ml 

co 

• 

• 

• 

• 

« 

• 

• 

i 

• 

• 

• 

• 

• 

/ 

/ 

e 

T|1 

rv 

CM 

r^. 

CM 

4 

to 

try 

to 

ON 

to 

/ 

/ 

C5 

to 

ON 

Mi 

CO 

Oi 

pH 

Cn 

CN 

ON 

l-v 

a» 

in 

NQ 

CN 

to 

NO 

to 

CO 

NO 

tn 

NO 

try 

NO 

* 

NO 

co 

Tfi 

CN 

Mi 

CO 

Mi 

CO 

Mi 

o 

Mi 

CO 

pH 

• 

• 

• 

• 

• 

• 

/ 

• 

• 

• 

• 

• 

• 

.O 

✓ 

O 

/ 

Vi 

au 

/ 

Ml 

try 

CM 

<r 

CM 

r-4 

to 

pH 

Ml 

vO 

/ 

/ 

to 

try 

Oi 

ON 

NO 

-4 

to 

vO 

co  00 

to 

4 

ON 

4 

NO 

NO 

4 

to 

00 

to 

00 

NO 

00 

NO 

VO 

NO 

CN 

Mi 

CM 

Mi 

co 

Mi 

-4 

Mi 

CM 

Mi 

CN 

Mi 

CO 

• 

• 

• 

• 

• 

/ 

• 

• 

• 

• 

• 

• 

• 

✓ 

/ 

Ml 

NO 

CM 

r* 

CM 

ao 

to 

pH 

/ 

Mi 

CN 

to 

ON 

o> 

00 

NO 

00 

to 

pH 

<55 

00 

to 

<r  a> 

4 

m 

NO 

4 

to 

n* 

to 

n* 

NO 

r* 

NO 

try 

NO 

CO 

Mi 

o 

Mi 

CN 

Tfi 

Mt 

CO 

M» 

CN 

Mi 

CO 

• 

• 

• 

• 

/ 

/ 

• 

• 

• 

• 

• 

• 

• 

• 

Ml 

o 

to 

try 

to 

4 

/ 

/ 

t 

to 

-4*  tO 

CO 

to 

pH 

Oi 

pH 

NO 

n* 

to 

NO 

<55 

tn 

to 

CO  ON 

vO 

4 

NO 

tn 

to 

ao 

to 

oo 

/ 

NO 

m 

NO 

vO 

NO 

tn 

Mi 

pH 

Mi 

pH 

Mi 

*4 

Mi 

CM 

Mi 

o 

Mi 

CN 

• 

• 

• 

/ 

/ 

• 

• 

• 

• 

« 

• 

• 

• 

• 

/ 

✓ 

f 

to 

NO 

to 

CO 

/ 

CM 

4 

tN 

-4* 

PM 

00 

CM 

*4 

O 

n. 

05 

NO 

CO 

O 

<55 

tn 

CN 

pH  ON 

pH 

<o 

to 

NO 

to 

00 

/ 

to 

CO 

to 

O 

to 

© 

to 

CM 

to 

O 

CM 

CN 

eg 

CM 

eg 

o 

CM 

CO 

CM 

CO 

• 

/ 

/ 

/ 

• 

to 

4 

/ 

/ 

/ 

f 

CM 

4 

to 

NO 

CM 

vO  CM 

•4  CM 

n* 

00 

tn 

oo 

f* 

<55 

ON 

CN 

tn 

ON 

ON 

CN 

to 

/ 

to 

H 

to 

CN 

to 

pH 

to 

CO  to 

CM 

to 

pH 

CM 

o 

CM 

O 

CM 

pH 

CM 

CO 

CM 

pH 

• 

/ 

/ 

/ 

i 

• 

• 

• 

1 

• 

• 

i 

• 

• 

* 

/ 

/ 

r 

to 

m  cm 

ON 

T*l 

pH 

TH 

pH 

Mi 

n* 

Tfi 

ON 

CO 

nO 

to 

NO 

Ml 

CO  ON 

CN 

CN 

CO  ON 

00 

*H 

/ 

to 

CO 

to 

CN 

NO 

CO 

NO 

CN 

NO 

CN 

NO 

NO 

pH 

Mi 

o 

Mi 

pH 

Mi 

O 

Mi 

CM 

Mi 

o 

/ 

/ 

J 

• 

• 

• 

• 

r 

• 

* 

# 

f 

Sft 

Vi  % 

Vl  % 

Vl  % 

vc  as 

Vc  % 

Vl  % 

u 

Vl  % 

M  % 

V4 

Vi 

Vl  3: 

u 

s 

pH 

X 

CN 

CO 

NT 

try 

NO 

oo 

ON 

o 

■■H 

CN 

CO 

o 

i-4 

pH 

pH 

M 

0U 

1  '—‘l  *r»*  ■  a 


rr 


-  33 


1 


Table  11 


Intercorrelationg  of  Total  Raaponaa  Latency  for  13  Probl— 


Problem 

Problem 

i 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

it 

33 

r 

.48 

It 

33 

33 

r 

.51 

.55 

A 

s 

54 

33 

33 

r 

.17 

.46 

.52 

e 

N 

54 

32 

32 

S3 

r 

.20 

.30 

.45 

.58 

c 

N 

54 

32 

32 

S3 

54 

r 

.25 

.36 

.59 

.30 

.35 

7 

It 

54 

32 

32 

S3 

S3 

S3 

r 

.22 

.29 

.25 

.28 

.23 

.33 

8 

N 

SO 

30 

30 

49 

49 

49 

SO 

r 

-.07 

-.02 

-.12 

.46 

.61 

.23 

.20 

Q 

It 

46 

26 

28 

45 

45 

45 

46 

46 

r 

-.01 

-.27 

.18 

.04 

.22 

.45 

.19 

.11 

10 

N 

44 

28 

28 

43 

43 

43 

44 

44 

44 

r 

.20 

.05 

.30 

.08 

.40 

.26 

.35 

.13 

.42 

li 

a 

49 

28 

28 

48 

48 

48 

49 

48 

45 

43 

r 

.30 

.46 

.27 

.46 

.47 

.04 

.42 

.19 

.18 

.41 

12 

N 

43 

28 

28 

47 

47 

47 

48 

47 

44 

42 

47 

r 

.11 

.13 

.31 

-.01 

.13 

.20 

.04 

-.00 

.42 

.65 

.13 

13 

n 

49 

29 

29 

49 

49 

49 

49 

48 

45 

44 

48 

47 

r 

.11 

.16 

.33 

.26 

.32 

.23 

.18 

.14 

.41 

.47 

.18 

.51 

f 


ences  in  the  speed  of  spatial  information  processing,  which  in  this 
case  represents  the  efficiency  with  which  a  sequence  of  moves  can  he 
traced  out  visually  and  maintained  in  memory.  Such  differences  may  or 
may  not  show  up  in  the  performance  measures,  since  students  may  compen¬ 
sate  for  slower  information  processing  speeds  with  more  care  and  slower 
response  latencies. 

Latency  trends.  Figure  2  shows  plots  of  response  latencies  in 
seconds  (vertical  axis)  versus  the  numbered  moves  (horizontal  axis)  for 
sampled  problems  for  two  students  who  performed  very  well  on  the  test 
as  a  whole  and  for  two  students  who  performed  poorly,  based  on  MRATE. 

In  each  graph  an  *  Indicates  where  the  plot  would  have  ended  had  the 
problem  been  solved  in  the  optimal  number  of  moves.  Graphs  which  con¬ 
tinue  beyond  the  2?th  move  at  the  right  end  of  the  horizontal  axis  were 
not  solved  by  the  student. 

The  graphs  shown  here  suggest  that  good  problem  solvers  (Student's 
A  and  2)  had  larger  initial  study  times  for  Move  1.  Although  this 
seemed  to  be  the  case  for  some  of  the  good  problem  solvers,  typical 
initial  study  times  for  other  good  problem  solvers  indicated  that  this 
was  not  a  consistent  trend.  Most  of  the  latency  graphs  examined  did 
seem  to  be  characterized  as  follows: 

1.  Generally,  initial  latencies  were  longer  than  the  latencies 
for  subsequent  moves. 

2.  "Spikes"  in  the  graphs  frequently  occurred  every  several 
moves,  indicating  that  the  student  was  restudying  the  problem 


Figure  2 

Latency  Trends  for  Two  Good  Test  Performers  and  Two  Poor  Test  Performers 
Based  on  Judges'  Performance  Ratings  (MRATE) 
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and/or  evaluating  his  or  her  progress.  Although  not  analyzed 
systematically,  some  student's  graphs  (e.g.,  Student  A  in 
Figure  2)  seem  to  he  characterized  by  higher  spikes  than 
others . 

3.  For  problems  that  were  solved,  latencies  typically  dropped  to 
2  to  4  seconds  for  the  last  3  to  6  moves,  indicating  that  the 
solution  path  had  been  discovered.  This  finding  may  be  con¬ 
sistent  with  short-term  memory  capacity  research,  which  indi¬ 
cates  that  somewhat  fewer  than  seven  "chunks”  of  information 
can  be  maintained  in  short-term  memory  while  other  cognitive 
resources  are  being  allocated  simultaneously  (Kintsch,  1977, 
p.  199). 

4.  Poorly  solved  problems  often  showed  a  conspicuous  absence  of 
spikes  or  restudy  points.  In  Figure  2  Problems  10  and  12  for 
Student  C  and  Problem  8  for  Student  B  exemplified  this  point. 
On  the  other  hand,  there  were  problems  solved  poorly  which  did 
contain  spikes  or  restudy  points  (e.g..  Problem  13  for  Student 
C),  indicating  that  the  student  was  trying  to  get  back  on  the 
right  track. 

Overall,  some  trends  were  suggestive,  but  they  were  by  no  means 
universal.  Although  perhaps  providing  clues  to  the  work  styles  of  some 
students  (e.g.,  impulsive  responding  with  few,  if  any,  study  points), 
the  latency  trends  appeared  to  be  too  idiosyncratic  to  be  very  useful 
from  a  psychometric  point  of  view. 

between  Performance  and  Response  Latencies 

The  correlations  between  the  number  of  moves  students  used  and  the 
initial  and  average  move  latencies  for  each  problem  Indicated  no  rela¬ 
tionship  between  these  latency  measures  and  performance  with  a  single 
problem.  Similarly,  when  initial  and  mean  latencies  for  each  problem 
were  correlated  with  total  scores  (Total  2)  and  MRATE,  no  significant 
correlations  were  found  (see  Appendix  Tables  F-5  and  F-6). 

Not  surprisingly,  problems  that  were  not  solved  well  took  longer 
than  problems  solved  well,  as  indicated  by  the  first  row  of  Table  12, 
which  shows  the  correlation  of  total  time  spent  on  each  problem  with 
the  number  of  moves  needed  (and,  hence,  the  individual  problem  scores 
Score  1  and  Score  2).  This  relationship  held  for  all  problems  except 
Problems  1,  3,  and  121  comparison  with  the  difficulty  index  in  Table  2 


Table  12 

Product-Moment  Correlations  Between  Total  Time  Spent  on  Each  Problem 
_ and  Performance  Meaeurea,  by  Problem _ 

Performance  _ _ Problem _ 

Measure  1  2  3  4  5  6  7  8  9  10  11  12  1? 

Individual 

Problem 


Score 

05 

61** 

09 

73** 

62** 

42** 

75** 

67** 

49** 

39** 

63** 

27 

40** 

Total  2 

-12 

24 

-09 

35** 

34** 

09 

41** 

35** 

13 

02 

28** 

-14 

-17 

MRATE 

12 

-28 

03 

-31 

-33** 

-14 

-37** 

-31* 

-30* 

-08 

-27 

-04 

00 

Hote.  Decimal  points  are  omitted. 
•Statistically  different  from  zero  at  p5.-05. 
•♦Statistically  different  from  zero  at  p<  .01. 
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shows  that  the  relationship  was  strongest  for  problems  of  middle  diffi¬ 
culty  levels  (Problems  4,  5,  7,  8,  and  11).  Vhen  total  problem  time 
for  each  problem  was  correlated  with  students'  total  test  performance, 
as  indexed  by  Total  2  and  MRATE,  these  same  problems  related  most  high¬ 
ly. 

These  data  indicate  that  with  the  exception  of  the  total  latency, 
or  time,  spent  on  a  problem,  the  response  latencies  did  not  show  any 
consistent  meaningful  relationship  to  performance. 

§5<l  biographical  lata 

Table  13  shows  the  frequency  and  percentage  of  students  endorsing 
various  response  alternatives  to  questions  about  prior  experience,  per¬ 
ceived  difficulty,  motivation  level,  and  self-evaluation.  Regarding 
prior  experience  with  this  problem  type,  Question  1  indicates  that  40% 
of  the  students  had  never  worked  on  this  problem  type,  58%  had  done  so 
a  few  times,  and  only  2%  had  worked  such  problems  many  times. 

Describing  how  students  are  to  solve  these  problems  and  enter 
their  moves  in  a  sequence  of  computerized  instructions  has  certain  dif¬ 
ficulties,  but  the  responses  to  Question  2  indicate  that  nearly  all 
students  had  little  or  no  difficulty  understanding  the  instructions. 
Most  students  thought  half  or  more  than  half  of  the  problems  were 
rather  easy  (Question  3),  were  not  at  all  or  only  slightly  neivous 
(Question  6) ,  and  either  enjoyed  working  on  the  problems  or  were 
neutral  about  it  (Question  8).  Responses  to  Question  4  suggest  that 
the  instructional  sequence  and  experimental  conditions  did  not  succeed 
in  motivating  most  students  to  try  hard  to  solve  all  of  the  puzzles  in 
as  few  moves  as  possible.  This  less  than  optimal  motivation  under 
conditions  where  the  test  has  no  particular  importance  to  the  student 
is  probably  more  of  a  problem  for  tests  of  this  type  than  for  more 
traditional  psychometric  measures,  since  each  item  or  problem  requires 
more  perseverance. 

It  is  difficult  to  say  how  much  the  scores  in  this  study  were  af¬ 
fected  by  some  students  being  less  concerned  about  optimal  performance. 
However,  to  examine  this  question  with  the  data  available,  the  mean 
total#tscore  (Total  2)  and  MRATE  of  students  responding  to  Question  4 
with  "a"  (mean  Total  2  =  .96,  mean  MRATE  *  5.59),  "b"  (mean  Total  2  = 
1.02,  mean  MRATE  =  4.93),  and  ”c"  (mean  Total  2  =  1.03,  mean  MRATE  = 

4. 99)  were  compared  and  no  significant  differences  found. 

Question  5  indicates  that  about  half  of  the  students  thought  the 
length  of  the  test  affected  their  motivation.  Finally,  56%  of  the  stu¬ 
dents  thought  they  did  fairly  well  on  the  test,  30%  thought  they  did 
not  do  very  well,  and  10%  had  no  idea  how  well  they  had  done  (Question 
7).  For  future  research  with  this  type  of  test,  it  would  be  of  inter¬ 
est  to  have  the  computer  ask  some  of  these  questions  during  actual 
testing  so  that  students'  motivation,  anxiety,  difficulty  perception, 
and  confidence  could  be  related  to  the  simultaneous  quality  of  their 
solutions . 

It  is  important  to  know  to  what  extent  a  test  measures  prior  ex¬ 
perience  with  the  assigned  tasks.  Differences  in  test  performance  due 
to  prior  experience  may  be  desirable  cr  undesirable  depending  on  the 
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Table  13 


1.  Before  today,  how  often  have  you  worked  on  this  kind  of  puzzle? 

a.  never  20  40 

b.  a  few  tinea  29  58 

c.  many  tines  1  2 

2.  How  much  difficulty  did  you  have  understanding  the  instructions  before 
starting  the  puzzles? 

a.  no  difficulty  39  78 

b.  a  little  difficulty  10  20 

c.  much  difficulty  1  2 

3.  Which  of  the  following  best  describes  how  difficult  you  thought  the 
puzzles  were? 

a.  All  of  the  puzzles  were  easy  3  6 

b.  A  few  puzzles  were  difficult,  the  rest  were  rather  easy  27  54 

c.  About  half  the  puzzles  were  easy  and  half  were  difficult  15  30 

d.  A  few  puzzles  were  easy,  the  rest  were  rather  difficult  5  10 

e.  All  of  the  puzzles  were  difficult  0  0 

4.  Which  of  the  following  best  describes  your  attitude  towards  completing 
the  puzzles? 

a.  1  tried  hard  to  solve  all  puzzles  in  as  few  moves  as  possible  18  36.7 

b.  I  tried  hard  to  solve  most  but  not  all  of  the  puzzles  in  as 

few  moves  as  possible  19  38.8 

c.  I  tried  to  solve  the  puzzles,  but  was  not  very  concerned  about 

using  as  few  moves  as  possible  12  24.5 

d.  I  didn't  care  whether  I  solved  the  puzzles  or  not  0  0 

5.  Did  the  length  of  the  test  affect  your  motivation? 

a.  not  at  all  19  38 

b.  somewhat  26  52 

c.  quite  a  bit  5  10 

6.  Were  you  nervous  or  uncomfortable  while  working  on  the  puzzles? 

a.  not  at  all  33  66 

b.  somewhat  17  34 

c.  very  much  so  00 

7.  How  well  do  you  think  you  did  on  the  puzzles? 

a.  very  well  2  4 

b.  fairly  well  28  56 

c.  not  very  well  15  30 

d.  I  don't  really  know  5  10 

8.  How  did  you  feel  about  working  on  the  puzzles? 

a.  I  disliked  it  a  lot  3  6 

b.  I  disliked  it  somewhat  A  8 

c.  I  felt  neutral  about  it  11  22 

d.  I  enjoyed  it  somewhat  26  52 

e.  I  enjoyed  it  a  lot  6  12 
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test  application.  In  this  study,  a  general  measure  of  spatial  reason¬ 
ing  ability  was  sought  so  that  performance  scores  would  not  he  signifi¬ 
cantly  determined  by  prior  experience  with  any  specific  spatial  task. 

A  comparison  of  the  mean  Total  2  score  ( ,Z6)  and  performance  ratings 
(5.55)  for  the  20  students  who  reported  no  prior  experience  with  this 
problem  type  and  of  the  mean  Total  2  score  (1.04)  and  performance  rat¬ 
ings  (4.73)  for  the  30  students  who  reported  having  worked  such  prob¬ 
lems  a  few  or  many  times  (see  Question  1)  showed  no  significant  perfor¬ 
mance  differences  based  on  stated  prior  experience.  Similarly,  a  com¬ 
parison  of  male  and  female  mean  Total  2  scores  (1.00  versus  1.00)  and 
mean  ratings  (5.09  versus  5.25)  also  showed  no  statistically  signifi¬ 
cant  differences. 

Perceived  Dlf f icu Ity  Ratings 
Dimensions  of  Perceived  Difficulty 

Table  14  shows  the  proportion  of  students  reporting  voluntary  use 
of  various  rating  dimensions  in  their  protocols  while  sorting  the  stim¬ 
uli  and  the  proportion  of  students  selecting  each  dimension  from  a  pre¬ 
pared  list  of  dimensions  provided  by  the  experimenter  after  the  sorting 
was  completed  (see  Appendix  D  for  the  rating  booklet).  The  last  column 
in  Table  14  shows  the  percentage  distribution  of  frequencies  with  which 
each  of  the  dimensions  in  the  prepared  list  was  used.  Table  14  shows 
that  all  dimensions  were  reported  less  frequently  in  the  free  response 
voluntary  protocol  situation  than  when  the  prepared  list  was  used. 

This  would  be  expected,  since  some  students  might  not  have  thought  to 
report  a  dimension  they  might  recall  using  when  prompted  later.  How¬ 
ever,  the  large  discrepancy  between  these  two  columns  for  Dimensions  h 
(number  of  columns  not  matching)  and  i  (number  of  rows  not  matching) 
would  suggest  that  these  two  dimensions  were  not  very  salient,  despite 
the  high  proportion  of  students  endorsing  these  dimensions  post  hoc. 

The  number  of  students  endorsing  the  supposedly  irrelevant  Dimensions  j 
and  1  under  the  prepared  list  conditions,  compared  to  the  near  absence 
of  these  dimensions  in  the  volunteered  protocols,  further  suggests  that 
something  like  social  desirability  responding  was  occuring  in  the  pre¬ 
pared  list  condition. 

An  examination  of  the  percentage  distribution  data  in  the  last 
column  indicates  these  less  relevant  dimensions  were  most  often  report¬ 
ed  as  being  used  in  Some  or  None  of  the  problems.  It  seems  likely  that 
if  students  endorsed  prepared  dimensions  that  had  not  actually  been 
used  or  that  were  not  the  most  salient,  they  would  endorse  the  Some 
category  rather  than  the  All  or  Post  categories.  On  the  other  hand, 
the  dimensions  reported  as  being  used  most  often  in  the  voluntary  pro¬ 
tocols  were,  with  the  exception  of  Dimension  c,  endorsed  most  heavily 
in  the  All  or  Most  categories  in  the  prepared  list.  Thus,  the  data 
from  the  voluntary  protocols,  in  conjunction  with  the  All  and  Most  cat¬ 
egories  in  the  frequency  ratings,  would  seem  to  be  the  best  indicators 
of  the  most  salient  rating  dimensions  that  students  thought  they  were 
using. 

From  Table  14  it  is  clear  that  the  most  salient  rating  dimension 
was  Dimension  a,  the  number  of  moves  required  to  solve  the  puzzle 
(l.e.,  the  solution  path  length).  Ninety-three  percent  of  the  students 
voluntarily  reported  this  dimension  in  some  form  in  their  protocols. 
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Table  14 


Dimensions  Used  in  Rating  Perceived  Difficulty 


Percentage  of  Students 
Reporting  Using  the 
Dimension  on  at  Least  Some 
of  the  Problems 

Percentage  of  Time 
Students  Reported  Using 
the  Dimension  in  Rating 
the  Problems 

Dimension 

Voluntary 

Protocols 

Prepared 

List 

All 

Most 

Some 

None 

a.  The  number  or 

explication  of  moves 

93 

98 

35 

35 

28 

2 

b.  Whether  can  "see" 
solution 

68 

100 

58 

28 

14 

0 

c.  Number  of  squares 
not  matching 

58 

91 

26 

30 

35 

9 

d.  Amount  of  time  to 
solve 

50 

93 

53 

26 

14 

7 

e.  Types  of  moves 
required 

50 

m 

_ 

f.  How  far  apart  certain 
numbers  were 

43 

86 

14 

30 

42 

14 

g.  How  much  thought 
required 

32 

_ 

— 

_ 

h.  The  number  of  columns 
not  matching 

18 

72 

19 

23 

30 

28 

i.  Number  of  rows 
not  matching 

11 

81 

21 

26 

35 

19 

j.  Location  of  empty 

space  in  left  pattern 

7 

63 

16 

19 

28 

37 

k.  Similarity  to  already 
solved  puzzle 

2 

_ 

1.  Whether  one  pattern: 

was  in  numeric  order 
from  1  to  15 

0 

39 

2 

2 

35 

60 

Note.  Missing  entries  are  for  dimensions  not  included  in  the  prepared  list  but 
reported  by  some  students  in  their  voluntary  protocols. 


and  virtually  all  students  (58%)  selected  the  dimension  In  the  prepared 
list  condition.  The  other^oost  salient  dimensions  were  Dimension  b, 
whether  the  student  could  "see"  the  solution  or  not;  Dimension  c,  the 
number  of  squares  not  matching  in  the  two  patterns;  Dimension  d,  the 
time  the  student  felt  it  would  take  to  solve  the  puzzle;  Dimension  e, 
the  type  or  nature  of  moves  required;  and  Dimension  f,  how  far  apart 
certain  numbers  were  in  the  two  patterns.  The  relative  rank  ordering 
or  salience  of  these  dimensions  would  be  difficult  to  justify,  since 
they  are  not  independent,  and  a  student  reporting  the  number  of  squares 
not  matching  in  his  or  her  protocol  could  have  been  taking  the  distance 
between  squares  into  account  as  well,  without  explicitly  reporting  this 
dimension. 


A  further  question  can  be  raised  as  to  whether  some  of  these  re~ 
ported  dimensions  are  really  rating  dimensions  underlying  difficulty 
Judgments  or  are  actually  synonymous  with  difficulty  Itself.  This 
would  seem  to  be  the  case  with  Dimensions  b  and  d  in  Table  14.  If  stu 
dents  had  been  asked  to  rate  "whether  ‘they  could  readily  see  the  solu- 
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tion  or  how  ouch  time  it  would  take  to  solve  the  puzzle,  the  rating 
task  might  he  equivalent  to  rating  the  difficulty;  ana  such  physical 
problem  characteristics  of  each  puzzle  as  path  length  and  the  distance 
between  various  numbers  would  probably  underly  these  judgments  as  well. 
It  would  seem,  then,  that  the  dimensions  most  important  for  students  in 
evaluating  the  difficulty  of  these  problems  were  the  solution  path 
length,  the  number  of  squares  not  matching  in  the  two  patterns,  and  the 
distance  dimension  of  how  far  apart  certain  squares  were  in  the  two 
patterns.  Since  no  dimension  was  used  for  all  problems,  it  seems 
likely  that  the  relative  importance  of  each  dimension  varied  somewhat 
for  each  problem,  depending  on  the  particular  pattern  configurations  to 
be  compared. 

Individual  Elf f erences  in  Mean  Pgrceivgd  difficulty 


Table  15  summarizes  the  mean  difficulty  ratings  for  each  of  the 
four  15-puzzle  problem  sets  separately.  These  data  show  that  there 
were  substantial  individual  differences  in  the  level  and  variation  of 
difficulty  perceptions,  even  for  the  same  problems.  For  example,  for 
Stimulus  Set  1,  although  the  average  student  thought  the  problems  were 
Easy  or  Somewhat  Easy,  one  student  thought  the  average  problem  in  the 
set  was  Very  Easy  and  another  thought  the  average  problem  was  Somewhat 
Eifficult.  Individual  differences  in  perceived  difficulty  of  the  prob¬ 
lems  within  stimulus  sets  was  also  evidenced,  since  about  two-thirds  of 
the  students  utilized  all  six  rating  categories,  but  about  one-third 
utilized  only  the  four  easiest  categories,  and  one  student  rated  all 
stimuli  with  the  two  easiest  categories.  Without  data  for  the  same 
students  on  an  independent  rating  task  irrelevant  to  the  difficulties 
rated  here,  it  is  not  possible  to  determine  to  what  extent  these  indi¬ 
vidual  differences  reflect  response  biases  in  the  use  of  category 
rating  scales;  but  it  seems  reasonable  to  assume  that  the  differences 
found  do  indicate  some  true  perceptual  differences  in  perceived  diffi¬ 
culty.  Presumably,  these  differences  reflect  individual  differences  in 
the  ability  to  visualize  and  to  maintain  a  sequence  of  moves  in  short¬ 
term  memory. 


Stimulus 

Set 

1 

2 

3 


Table  15 

Individual  Differences  in  Mean  Difficulty  Perception 

Individual  Mean  Ratings 

Lowest _ Mean _ 1 

1.12  Very  Easy  2.57  Easy/Somewhat  Easy  3.77  Somei 

1.94  Easy  3.13  Somewhat  Easy  3.82  Somei 

1.69  Easy  2.63  Somewhat  Easy  3.44  Somei 


1.76  Eas\ 


2.86  Somewhat  Eas^ 


_ Highest _ 

3.77  Somewhat  Difficult 
3.82  Somewhat  Difficult 
3.44  Somewhat  Easy/ 

Somewhat  Difficult 
4.12  Somewhat  Difficult 


Perceived  Elff iculty  and  Nugbgr  c£  Moves 

That  the  obtained  Individual  differences  in  perception  seem  to  be 
reliable  is  suggested  by  the  data  in  Figure  3,  which  shews  the  per¬ 
ceived  difficulty  ratings  of  four  students  within  Problems  9  and  10  as 
the  distance  in  moves  from  the  start  puzzle  configuration  approached 
the  goal  puzzle  configuration.  These  graphs  were  obtained  by  having 
students  rate  the  difficulty  of  reaching  the  goal,  not  only  from  the 
start  configuration,  but  from  various  intermediary  configurations  be¬ 
tween  the  start  and  goal  configuration.  Thus,  for  example,  in  Figure 


-  42 


3a  It  might  he  presumed  that  if  Student  9  were  actually  attempting  to 
solve  Problem  9,  the  puzzle  would  look  Somewhat  Difficult  to  him  or  her 
until  he  or  she  was  about  7  moves  away  from  the  goal*  then  difficulty 
would  drop  off  rapidly  until  he  or  she  was  4  or  5  moves  from  the  goal, 
when  the  puzzle  would  appear  to  be  Very  Easy. 

Note  that  across  both  the  problems  shown  in  Figure  3,  the  four 
students  show  marked  consistency  in  how  they  perceived  the  difficulty 
of  different  puzzle  distances.  For  example.  Student  4  perceived  both 
problems  as  easier  than  the  mean  student  at  all  distances  from  the 
goal,  whereas  Student  6  perceived  both  problems  as  more  difficult  than 
the  mean  student  did  at  all  distances  from  the  goal.  Even  though  only 
a  few  examples  of  students  and  problems  are  shown  in  Figure  3,  this 
tendency  for  reliable  individual  differences  in  difficulty  perceptions 
was  present  in  nearly  all  combinations  of  students  and  problems  exam¬ 
ined.  These  data  suggest  that  if  the  differences  in  difficulty  percep¬ 
tions  relate  to  performance,  then  reliable  individual  performance  dif¬ 
ferences  in  solving  these  problems  should  be  obtainable. 

Relation ship  of  Difficulty  Pgrpeption  and  Path  length 

Since  path  length  seemed  to  be  a  dominant  dimension  in  the  student 
protocols,  difficulty  perception  scale  rallies  were  correlated  and  plot- 

Figure  4 

Bivariate  Distribution  of  Perceived  Difficulty 
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ted  against  path  length  for  all  67  puzzles.  Figure  4  shews  the  scat- 
terplot  relating  solution  path  length  of  each  puzzle  to  its  mean  scale 
value.  Although  the  correlation  between  the  two  variables  was  .88.  the 
relationship  between  the  two  variables  was  not  strictly  linear  at  the 
right,  or  high,  end  of  the  plot.  Although  end  effects  oust  always  be 
considered  in  category  rating  scales,  the  fact-  that  students  could  have 
assigned  higher  ratings  at  the  high  end  of  the  curve  would  suggest  that 
the  flattening  of  the  curve  for  long  path  lengths  represents  a  real 
effect.  Students  apparently  could  not  discriminate  differential  path 
lengths  greater  than  about  16.  Perhaps  a  secondary  rating  dimension, 
such  as  the  distance  between  numbers  in  the  pattern  or  the  number  of 
squares  not  matching  in  the  two  patterns,  is  important  in  differenti¬ 
ating  problems  with  longer  path  lengths. 

Figure  4  also  provides  estimates  of  how  difficult  puzzles  with 
different  path  lengths  will  appear  to  the  average  student  when  begin¬ 
ning  work  on  a  problem.  A  puzzle  perceived  to  be  Very  Easy  would  cor¬ 
respond  to  a  value  on  the  vertical  axis  in  Figure  4  between  .5  and  1.5; 
Easy  puzzles  would  range  from  1.5  to  2.5;  Somewhat  Easy,  from  2.5  to 
3.5;  somewhat  Difficult,,  from  3.5  to  4.5;  Difficult,  from  4.5  to  5.5? 


Figure  5 

Bivariate  Distribution  of  Standard  Deviations  of 
Perceived  Difficulty  Racings  and  Path  Length  for  67  Puzzles 
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and  Very  Difficult,  from  5.5  to  6.5.  Solution  path  lengths  correspond¬ 
ing  to  the  difficulty  categories  overlapped  somewhat  with  Very  Easy 
ratings  corresponding  to  puzzles  requiring  1  to  7  moves  each,  Easy  for 
puzzles  of  4  to  10  moves.  Somewhat  Easy  for  puzzles  ranging  from  6  to 
16  moves.  Somewhat  Difficult  for  puzzles  requiring  8  to  18  moves,  and. 
Difficult  for  puzzles  with  from  16  to  26  moves.  None  of  the  puzzles 
used  here,  which  ranged  from  1  to  26  moves,  were  rated  Very  Difficult 
hy  the  average  student. 

Figure  5  shows  a  plot  of  solution  path  lengths  versus  the  standard 
deviations  of  the  students'  category  ratings.  These  data  demonstrate 
that  although  students  tended  to  agree  more  in  their  difficulty  percep¬ 
tions  for  stimuli  with  short  or  very  long  solution  paths,  there  was 
substantial  disagreement  in  perceived  difficulty  for  puzzles  with  path 
lengths  in  the  middle  range,  with  a  peak  disagreement  for  solution 
paths  of  about  10  moves. 


DISCUSSION  AND  CONCLUSIONS 


Problem  Characteristics 

The  data  suggested  that  four  performance  indices  might  be  useful 
in  indexing  problem  difficulty:  (1)  the  mean  number  of  moves  in  the 
sample,  (2)  the  proportion  of  students  solving  a  problem,  (3)  the  pro¬ 
portion  of  students  solving  a  problem  in  the  optimal  number  of  moves, 
and  (4)  the  Special  Difficulty  Index.  These  four  indices  showed  sub¬ 
stantial  agreement  in  rank  ordering  the  difficulty  of  the  problems. 

Because  it  adjusts  for  differences  in  solution  path  length  while 
also  taking  into  account  the  average  number  of  moves  required  by  the 
sample,  the  Special  Difficulty  Index  not  only  appeared  to  be  the  best 
index  of  problem  difficulty  but  also  correlated  lower  with  the  solution 
path  length  of  each  problem  than  the  other  performance  indices  used  to 
estimate  problem  difficulty.  This  is  a  desirable  situation,  since 
longer  puzzles  were  not  always  the  most  difficult.  Future  research 
with  this  problem  type  should  consider  use  of  some  short,  but  less 
direct  or  obvious,  problems. 

The  number  of  illegal  and  repeated  moves  were  found  to  be  too  low 
and  not  consistent  enough  for  individuals  across  problems  to  be  useful 
performance  indices,  at  least  for  this  problem  set  and  sample. 

Examination  of  problem  performance  indices,  the  Special  Difficulty 
Index,  and  students'  perceptions  of  the  difficulty  of  the  test  problems 
indicated  that  with  the  exception  of  Problems  S,  10,  12,  and  13,  the 
problems  were  too  easy  for  most  students.  For  example,  except  for 
these  four  problems,  70%  or  more  of  the  students  solved  each  of  the 
remaining  problems  in  the  minimum  number  of  moves.  It  seems  likely 
that  these  highly  skewed  distributions  of  number  of  moves  to  completion 
precluded  high  correlations  of  individual  performance  indices  across 
problems,  since  small  absolute  differences  in  scores  across  problems 
would  be  accentuated.  Thus,  the  consistency  across  problems  of  the 
number  of  moves  to  completion  was  generally  poor,  with  indications  of 
only  small  to  moderate  consistency  for  clusters  of  problems  of  similar 
difficulty.  It  is  possible  that  if  a  more  difficult  set  of  problems 
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that  were  more  similar  in  difficulty  levels  were  administered,  better 
measures  of  consistency  of  performance  would  be  obtained.  The  item-to¬ 
tal  score  correlations  obtained  for  each  problem  suggested  that  it 
would  be  possible  to  obtain  a  more  discriminating  subset  of  problems. 
Because  this  was  an  exploratory  study,  however,  no  preselection  of 
problems  was  possible.  Since  the  data  suggest  that  better  consistency 
may  be  obtained  using  problems  of  similar  difficulty  levels,  an  adap¬ 
tive  test,  which  tailors  problems  to  the  ability  level  of  each  student, 
should  increase  the  reliability  of  measurement. 

Scoring  Methods 

Four  alternative  methods  of  scoring  total  test  performance  and  two 
methods  of  scoring  individual  problem  performance  were  studied.  The 
scores  that  took  into  account  differential  numbers  of  moves  (Total  1, 
Total  2)  between  the  optimal  and  maximum  number  allowed  appeared  to  be 
the  best,  on  intuitive  grounds,  and  were  also  related  somewhat  more  to 
Judges'  performance  ratings.  The  Total  2  score,  which  also  took  into 
account  the  difficulty  of  the  problems  the  student  attempted,  appeared 
to  be  the  most  meaningful  score.  Where  other  methods  rank  ordered  stu¬ 
dents  differently,  the  rank  ordering  provided  by  Total  2  was  most 
highly  related  to  Judges'  performance  ratifigs.  Although  Total  2  may 
appear  to  be  additive  in  that  it  averages  individual  problem  scores 
(Score  2),  the  pattern  or  configuration  of  individual  problem  perfor¬ 
mance  is  taken  into  consideration,  since  the  individual  problem  scores 
(Score  2)  are  adjusted  for  the  difficulty  of  each  problem,  as  reflected 
in  the  mean  performance  of  the  sample  on  the  problem.  As  a  result, 
students  are  penalized  more  for  poor  performance  on  easier  problems, 
relative  tc  the  group,  than  they  are  on  more  difficult  problems.  In 
this  way  students  who  solve  the  same  number  of  problems  but  have  dif¬ 
ferent  patterns  of  performance  will  obtain  different  Total  2  scores. 

Future  research  with  this  problem  type  will  require  study  of  the 
validity  of  the  various  performance  scores  against  relevant  external 
criteria.  Since  no  such  reliable  criteria  were  available  in  this 
study,  the  meaningfulness  of  the  scores  was  tentatively  determined  by 
comparing  these  objective  scores  with  judges'  performance  ratings  of 
test  performance.  Strong  Indications  of  concurrent  validity  were 
found.  Those  cases  in  which  the  objective  score  ordered  students  dif¬ 
ferently  than  the  ratings  indicated  that  whereas  the  objective  score 
(Total  2)  penalized  students  more  than  Judges'  ratings  for  poor  perfor¬ 
mance  on  easier  problems,  the  Judges  pena lized ,  students  more  for  not 
attempting  some  problems  (although  this  was  not  always  the  student's 
fault)  and  for  doing  poorly  on  more  difficult  problems.  Although  it  is 
difficult  to  determine  which  measure  is  more  valid  without  an  external 
criterion,  the  high  correlations  between  the  objective  scores  and  the 
judges'  ratings  suggest  some  validity  in  both  types  of  data. 

Latencies 

Mean  initial  and  total  latencies  for  each  problem  were  strongly 
related  to  seme  of  the  performance  Indices  of  problem  difficulty.  That 
is,  the  group  as  a  whole  utilized  longer  initial  study  times  and  longer 
total  work  times  on  more  difficult  problems.  Similarly,  problems  that 
took  longer  to  solve  were  Initially  studied  longer.  The  average  laten¬ 
cy  of  moves  within  a  problem  did  not  relate  to  problem  difficulty. 
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At  the  level  of  individual  performance,  only  total  latency  or 
problem  solution  time  was  related  to  problem  performance.  Some  good 
problem  solvers  were  characterized  by  very  long  initial  latencies,  but 
this  tendency  was  not  universal.  Many  good  problem  solvers  did  not 
intially  study  the  problem  longer  than  did  the  average  poor  problem 
solver.  The  average  problem  response  latency  measure  did  not  relate  to 
individual  student  performances. 

Plots  of  latency  trends  across  problems  were  interesting  from  a 
descriptive  point  of  view  in  indicating  that  most  students'  trends 
showed  longer  initial  latencies  followed  by  a  few  quicker  moves,  occa¬ 
sional  spikes  indicating  re-evaluation  of  progress,  and  finally  several 
very  quick  final  moves  indicating  that  the  sequence  of  moves  to  solu¬ 
tion  had  been  detected.  However,  no  universal  trends  in  response  la¬ 
tencies  seemed  to  characterize  good  problem  solvers  versus  poor  problem 
solvers  well  enough  to  be  useful  in  scoring  or- predicting  individual 
performance.  Latencies  in  this  study  seemed  to  confound  differences  in 
the  ability  to  visualize  a  sequence  of  moves  and  differences  in  stu¬ 
dents'  work  styles.  Strong  evidence  for  such  work  styles  was  found  in 
the  consistency  of  initial,  average,  and  total  response  latency  mea¬ 
sures  across  all  problems.  Students  who  took  longer  initial  study 
times,  longer  average  times  between  moves,  and  longer  total  work  times 
on  one  problem  showed  a  consistent  tendency  to  do  so  on  other  problems 
as  well. 

Thus,  while  the  response  latency  measures  were  predictive  of  prob¬ 
lem  difficulty  and  indicated  the  existence  of  consistent  styles  of 
problem-solving  behavior,  they  did  not  appear  to  be  useful  in  scoring 
individual  performance. 

and  Biographical  Correlates  o£  Pexfpraance 

Although  the  posttest  reaction  questionnaire  indicated  that  only 
40®  of  the  students  had  never  worked  problems  of  this  type  before,  mean 
performance  scores  between  these  students  and  those  who  had  previously 
worked  such  problems  were  not  significantly  different. 

Only  36.7%  of  the  students  reported  trying  hard  to  solve  all  the 
problems  in  the  minimum  number  of  moves.  Slightly  more  students  said 
they  tried  hard  to  solve  most,  but  not  all,  of  the  problems.  Although 
mean  performance  differences  between  subgroups  reporting  different 
levels  of  motivation  were  not  significantly  different,  these  data  plus 
the  fact  that  52%  of  the  students  felt  their  motivation  was  affected  by 
the  length  of  the  test  Indicate  that  total  testing  times  may  need  to  be 
shorter  for  this  type  of  task  than  for  tests  with  more  conventional 
item  formats. 

No  sex  differences  in  performance  were  found  on  this  test.  That 
males  typically  show  better  spatial  ability  (Garai  &  Scheinfeld,  196SJ 
MacCoby  &  Jacklin,  1574)  and  restructuring  ability  (MacCoby,  1966: 
Sweeney,  1953?  Terman  S.  Tyler,  1S54J  Tyler,  1965)  would  seem  to  predict 
male  superiority  on  this  test.  On  the  other  hand,  females  have  gener¬ 
ally  been  found  to  be  less  Impulsive  ( MacCoby ,1966;  Terman  &  Tyler, 
1554;  Tyler,  1S65)  and  better  in  perceptual  speed  and  fluency  (Garai  & 
Scheinfeld,  1968).  The  failure  to  obtain  sex  differences  with  this 
type  of  task  will  only  be  of  concern  cnce  more  reliable  measurement  is 


achieved.  At  that  time,  hypothesized  correlates  of  these  problems 
should  be  examined  to  determine  whether  scores  index  spatial  reasoning, 
restructuring,  impulsivity,  or  some  other  psychological  variable. 

Dimensions  of  Perceived  difficulty 

The  most  salient  dimensions  of  perceived  difficulty  were  the 
number  of  troves  required  to  solve  the  puz2le,  the  number  of  squares  net 
matching  in  the  two  patterns,  and  the  distance  dimension  of  how  far 
apart  certain  squares  were  in  the  two  patterns.  Since  no  dimension  was 
reported  as  having  been  used  for  all  problems,  it  seems  likely  that  the 
relative  importance  of  each  dimension  varied  somewhat  for  each  problem, 
depending  on  the  particular  pattern  configurations. 

When  the  actual  values  of  these  dimensions  were  computed  for  the 
problems  used  in  the  computer-administered  test  (see  Appendix  Table  C), 
a  hypothesized  rank  ordering  of  problems  by  difficulty  was  obtained. 
These  three  rank  orders  were  quite  similar  (.51  <  n  <  .79)  but  were  not 
as  consistent  as  the  rank  orderings  for  difficulty  obtained  from  per¬ 
formance  indices  such  as  mean  number  of  moves,  proportion  of  students 
solving  the  problem,  and  the  Special  Difficulty  Index  (see  Table  3). 
Thus,  although  these  physical  dimensions  may  be  useful  as  a  tentative 
index  of  problem  difficulty  for  use  in  initial  problem  selection  prior 
tc  data  collection,  the  performance  measures  should  provide  more  pre¬ 
cise  indices  of  difficulty  once  normative  data  can  be  obtained. 

The  actual  perceived  difficulty  ratings  showed  substantial  indi¬ 
vidual  differences  in  the  level  and  variability  of  difficulty  percep¬ 
tions,  even  for  the  same  set  of  problems.  Although  possible  individual 
biases  in  the  use  of  category  rating  scales  cannot  be  discounted,  the 
data  suggest  that  the  individual  differences  found  were  differences  in 
subjective  difficulties  relating  to  individual  differences  in  ability 
to  visualize  and  to  maintain  a  sequence  of  moves  in  short-term  memory. 
Examination  of  individual  difficulty  perceptions  across  problems  indi¬ 
cated  that  these  differences  were  reliable.  These  data  suggest  that  if 
the  reliable  differences  in  difficulty  perceptions  do  in  fact  relate  to 
differential  ability  to  visualize  successful  move  sequences,  then  an 
adequate  selection  of  problem  replications  should  be  able  to  tap  these 
differences,  resulting  in  reliable  performance  differences. 

Comparison  of  the  easy  problems  with  the  problems  that  challenged 
students  more  in  the  computer-administered  test  suggested  that  too  many 
of  the  problems  could  be  solved  in  a  reactive  manner,  that  is,  by  re¬ 
sponding  tc  the  immediate  stimulus  pattern  without  trying  to  visualize 
or  to  plan  several  moves  ahead.  Such  problems  would  not  tap  differ¬ 
ences  in  students'  ability  to  visualize  a  sequence  of  moves  because 
students  would  not  find  themselves  in  a  difficult  situation  by  not 
planning  ahead.  The  more  challenging  problems  (e.g.;>  Problems  S,  10, 
12,  and  13)  were  those  in  which  a  student  could  get  ’in  trouble"  by  not 
visualizing  several  moves  in  advance  (see  Appendix  C).  This  implies 
that  future  studies  should  include  more  problems  that  prevent  reactive 
solutions,  i  .e  . ,  require  more  planning  ahead. 

Comparison  of  the  mean  perceived  difficulty  of  the  problems  in¬ 
cluded  in  the  computer-administered  test  indicated  less  agreement  with 
actual  problem  difficulty  than  might  be  expected  from  other  studies. 


This  appeared  to  be  due  to  the  inability  of  students  to  differentiate 
the  relative  difficulties  of  problems  with  longer  solution  paths. 

Thus,  to  the  extent  that  increased  motivation  under  adaptive  testing 
depends  or  correct  student  perceptions  of  problem  difficulty  (Prestwood 
&  Veiss ,  1977),  adaptive  administration  of  this  problem  type  may  not 
have  a  motivational  advantage.  Cn  the  other  hand,  reduced  frustration 
would  seem  likely  to  result  under  adaptive  conditions  from  not  requir¬ 
ing  students  to  work  on  problems  much  more  difficult  than  their  ability 
levels,  even  if  they  cannot  accurately  perceive  the  actual  difficulty 
of  the  problem  beforehand. 

The  perceived  difficulty  scale  values  related  highly  (r  =  .75)  to 
the  mean  Initial  response  latency  measure  for  the  computer-administered 
problems.  This  supports  the  idea  that  the  students  spend  time  before 
•  their  first  move  trying  to  visualize  a  sequence  of  moves,  since  path 
length  appeared  to  be  a  primary  rating  dimension  in  determining  per¬ 
ceived  difficulty. 

.Conclusions 

The  results  from  this  pilot  study  suggest  certain  improvements  in 
problem  selection  and  design.  Future  tests  of  this  type  should  consist 
of  fewer  but  mere  difficult  problems,  particularly  those  which  do  not 
permit  reactive,  impulsive  solutions.  If  individual  differences  in  the 
ability  to  construct  an  optimal  sequence  of  moves  are  to  be  tapped, 
then  more  problems  must  be  designed  that  force  the  student  to  plan 
ahead,  [“'ore  complex  problems  should  overload  the  memories  of  students 
and  should  induce  differences  in  strategies  in  manipulating  the  number 
patterns. 

If  reliable  performance  indices  can  be  obtained,  the  process  of 
validating  the  meaning  of  the  scores  will  be  necessary.  To  scores  re¬ 
flect  individual  differences  in  spatial  reasoning  and  problem-solving 
ability  or  in  personality  variables  like  perseverance  and  impulsivity? 
It  might  also  be  of  interest  to  determine  what  information-processing 
abilities  underly  performance  on  these  problems.  For  example,  using 
Carroll's  (1974)  provisional  coding  scheme  for  cognitive  tasks  appear¬ 
ing  in  psychometric  tests,  the  following  cognitive  operations  might  be 
expected  to  underly  performance:  (1)  mental  rotation  of  spatial  config¬ 
urations  in  visual  short-term  memory.  Factors  S  and  Vz,*  (2)  performing 
serial  operations  in  visual  short-term  memory,  Factors  S  and  VzJ  and 
(3)  storage  in  and  retrieval  from  short-term  memory,  Factor  Ms. 

The  results  reported  here  suggest  that  reasonable  indices  of  prob¬ 
lem  difficulty  are  obtainable  given  an  appropriate  norming  sample.  If 
reliable  and  valid  ability  scores  can  be  obtained  in  future  studies 
with  this  item  type,  this  type  of  test  would  seem  especially  appropri¬ 
ate  for  adaptive  administration,  since  (1)  scores  on  problems  tailored 
to  the  individual's  ability  are  mere  apt  to  be  more  highly  related  to 
each  other,  resulting  in  total  scores  with  higher  reliability;  (2) 
adaptive  administration  will  likely  improve  the  motivational  aspects  of 
the  tests,  which  seem  more  taxing  and  potentially  frustrating  than  con¬ 
ventional  item  formats;  and  (3)  equally  precise  measurements  for  most 
testees  can  be  obtained  in  shorter  periods  of  time  than  with  conven¬ 
tional  test  administration.  Thus,  the  data  suggest  that  future  devel¬ 
opment  of  adaptive  problem-solving  tests  of  the  type  studied  here  might 


result  in  new  types  of  ability  tests  that  should  provide  ability  scores 
to  supplement  those  available  from  the  paper-and-pencil  administration 
of  typical  ability  measures. 
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APPENDICES 


Appendix  A: 

Diagnostic  Error  Messages  Provided  by  Testing  System 


Illegal  Moves: 

18  IS  NOT  A  NUMBER  IN  THE  PATTERN.  REMEMBER  TO  PUSH  THE  "SPACE” 
BAR  PIRST  II  THE  NUMBER  TO  BE  ENTERED  CONTAINS  ONLY  ONE 
DIGIT. 

10P  IS  NOT  A  CORRECT  MOVE.  THE  LAST  CHARACTER  TYPED  MUST  BE  AN 
L,  R,  U,  OR  D. 

10  CAN  NOT  BE  MOVED  LEFT  (RIGHT,  UP,  DOWN)  FROM  ITS  PRESENT 
POSITION. 


Maximum  Move  Limit  Reached: 

YOU  HAVE  REACHED  THE  MAXIMUM  NUMBER  OF  MOVES  ALLOWED  FOR  THIS 
PROBLEM.  PLEASE  CONTACT  THE  PROCTOR. 


Computer  Lata  File  Error: 

THE  COMPUTER  IS  HAVING  PROBLEMS.  PLEASE  NOTIFY  THE  PROCTOR. 
(ERROR  06  HAS  OCCURRED.  I ERR  IS  -5). 


Maximum  Time  Limit  Reached: 

IT  MIGHT  BE  A  GOOD  IDEA  TO  GO  ON  TO  THE  NEXT  PROBLEM.  PLEASE 
CONTACT  THE  PROCTOR. 


HELLO  AND  THANK  TOU  POE  TOUR  PARTICIPATION  IN  THIS  STDDT. 

THE  COMPUTES  WILL  SOON  PHESENT  YOU  WITH  A  SERIES  OF  PUZZLES  TO  WORK  ON, 
BUT  FIRST  SOME  INSTRUCTIONS  WILL  BE  GIVEN  TO  BE  SURE  TOU  UNDERSTAND 
HOW  TO  USE  THE  TYPEWRITER  KEYBOARD  TO  ENTER  TOUR  RESPONSES. 

FOLLOWING  THE  INSTRUCTIONS  TOU  WILL  BE  GIVEN  A  PRACTICE  PROBLEM  TO 
CLEAR  UP  ANY  PROBLEMS  YOU  ARE  HAVING.  IN  ADDITION,  IF  YOU  HAVE 
QUESTIONS  AT  ANY  TIME  ABOUT  THE  INSTRUCTIONS  OR  ANYTHING  ELSE  PLEASE 
FEEL  FREE  TO  CONTACT  TEE  TEST  PROCTOR. 

YOU  MUST  REMEMBER  TWO  THINGS  IN  ORDER  TO  TALK  TO  THE  COMPUTER: 

1.  ONLY  TYPE  SOMETHING  WHEN  A  MESSAGE  ON  THE  SCREEN 
IN  FRONT  OF  YOU  TELLS  TOU  TO  DO  SO  AND  A  QUESTION 
MARK  (?)  APPEARS. 

2.  EACH  TIME  YOU  TYPE  A  RESPONSE  ON  THE  KEYBOARD 
THE  COMPUTER  DOES  NOT  RECEIVE  IT  UNTIL  TOU  PRESS 
THE  "RETURN**  KEY. 

NOW  THE  FIRST  THING  YOU  MUST  DO  IS  FIND  THE  "RETURN" 

KEY.  THIS  KEY  IS  THE  LARGE  RECTANGULAR  KEY  ON  THE 

BIGHT  END  OF  THE  KEYBOARD.  PRESS  THE  "SPACE”  BAR  AT  THE  BOTTOM  OF  THE 
KEYBOARD  AND  THE  "RETURN"  KEY  TO  CONTINUE  THE  INSTRUCTIONS. 

? 

Screen  2 


1  2  3  4  1  2  3 
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9  10  11  9  10  11  S 

12  13  14  15  12  13  14  15 

IN  EACH  OF  THE  PUZZLES  OF  THE  TYPE  SHOWN  HERE  YOUR  TASK  IS  TO 
TYPE  IN  A  SEQUENCE  OF  "MOVES”  TO  CHANGE  THE  PATTERN  OF  NUMBERS 
CN  TH3  LEFT  UNTIL  IT  MATCHES  THE  PATTERN  ON  THE  RIGHT.  A  "MOVE" 
CONSISTS  OF  3  TYPED  CHARACTERS  FOLLOWED  BY  THE  "RETURN"  KEY.  THE 
FIRST  2  CHARACTERS  TELL  THE  COMPUTER  WHICH  "NUMBER”  IN  THE 
FATTERN  ON  THE  LEFT  TOU  WANT  TO  MOVE.  THE  THIRD  CHARACTER 
(WHICH  YOU  WILL  BE  TOLD  ABOUT  SHORTLY)  TELLS  THE  COMPUTER  WHAT 
DIRECTION  YOU  WANT  TO  MOVE  THE  NUMBER. 

IF  THE  NUMBER  YOU  WISH  TO  MOVE  HAS  2  DIGITS  TOU  SHOULD  TYPE 
THE  2  DIGITS  ON  THE  KEYBOARD.  IF  THE  NUMBER  YOU  WISH  TO  MOVE 
HAS  ONLY  1  DIGIT  YOU  SHOULD  TYPE  THE  SPACE  BAR  CNCE  AND  THEN  THE 
DESIRED  DIGIT.  THUS,  THE  TWO  DIGIT  NUMBERS  10  TO  15  CAN  BE  TYPED 
IN  DIRECTLY,  WHILE  THE  'SPACE'  BAR  MUST  BE  TYPED  FIRST  WITH 
THE  NUMBERS  1  TO  9. 

PRESS  THE  "SPACE"  BAR  AND  "RETURN"  TO  CONTINUE. 
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12  3  4  1  2  3 
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S  10  11  S  10  11  8 

12  13  14  15  12  13  14  15 

AS  MENTIONED  ABOVE  TEE  THIRD  CHARACTER  IN  TOUR  "MOVE"  TELLS  THE 
COMPUTER  WEAT  BISECTION  TO  MOVE  THE  NUMBER  IN  TEE  LETT  PATTERN. 
NUMBERS  CAN  ONLT  BE  MOVED  INTO  THE  SPACE  IN  TEE  SQUARE  PATTERN 
WHICH  IS  NOT  OCCUPIED  BT  A  NUMBER.  YOU  TELL  TEE  COMPUTER  WHICH 
DIRECTION  TO  MOVE  TEE  NUMBER  BT  TYPING  ONE  OT  TEE  FOLLOWING  4  LETTERS 
L  -  IF  YOU  WANT  TO  MOVE  A  NUMBER  TO  THE  LEFT  ONE  SPACE 

R  -  IF  YOU  WANT  TO  MOVE  A  NUMBER  TO  THE  RIGHT  ONE  SPACE 

U  -  IF  YOU  WANT  TO  MOVE  A  NUMBER  UP  ONE  SPACE 

D  -  IF  YOU  WANT  TO  MOVE  A  NUMBER  DOWN  ONE  SPACE 

THUS ,  IN  THE  PATTERN  SHOWN  HERE  THE  FOLLOWING  4  MOTES  ARE 
POSSIBLE:  10R,  11L,  14U,  OR  <SPACE  BAR>7D.  ANT  OTHER  MOVE 
WOULD  BE  ILLEGAL  AND  RESULT  IN  A  REMINDER  MESSAGE  BEING 
PRINTED  BY  THE  COMPUTER.  FOR  EXAMPLE,  YOU  COULD  NOT  TRY  TO  MOVE 
THE  "ll"  SQUARE  TO  THE  RIGHT  ONE  SPACE  SINCE  ALL  MOVES  MUST 
STAY  WITHIN  THE  SQUARE  PATTERN. 

PRESS  THE  "SPACE"  AND  "RETURN"  TO  CONTINUE  INSTRUCTIONS. 

Screen  4 

IF  YOU  HAVE  MADI  A  LEGAL  MOVE  THE  COMPUTER  WILL  AUTOMATICALLY 
AND  VERY  QUICKLY  UPDATE  THE  PATTERN  ON  THE  LEFT  WHERE 
YOU  ARE  MAKING  YOUR  MOVES.  IF  YOUR  MOVE  IS  NOT  LEGAL  A 
MESSAGE  WILL  BE  PRINTED  UNDER  YOUR  MOVE  AND  YOU  SHOULD  TRY 
AGAIN  WHEN  THE  COMPUTER  TELLS  YOU  TO  EO  SO. 

IF  YOU  ARE  HAVING  DIFFICULTY  UNDERSTANDING  THE  INSTRUCTIONS 
SO  FAR  PLEASE  CALL  THE  PROCTOR.  OTHERWISE  PRESS  THE  "SPACE" 

EAR  AND  "RETURN"  TO  CONTINUE  THE  INSTRUCTIONS. 

? 

* 

Screen  S 

SUPPOSE  YOU  MAKE  A  MISTAKE  TYPING  SOMETHING  INTO  THE 
COMPUTER.  YOU  CAN  CORRECT  A  MISTYPED  CHARACTER  AT  ANY 
TIME  BEFORE  YOU  PRESS  THE  "RETURN"  KEY.  BY  PRESSING  THE 
BACKSPACE  KEY  WHICH  IS  LOCATED  IN  THE  TOP  RIGHT  CORNER 
OF  THE  KEYBOARD  YOU  WILL  "ERASE"  THE  LAST  CHARACTER  YOU 
TYPED.  TO  "ERASE"  THE  LAST  TWO  CHARACTERS  YOU  TYPED  PRESS 
THE  BACKSPACE"  KEY  TWICE  AND  SO  ON. 

AFTER  PRESSING  "BACKSPACE"  THE  CORRECT  CHARACTER  CAN  THEN 
BE  TYPED  IN.  REMEMBER  TO  PRESS  THE  "RETURN"  KEY  TO  SEND 
THE  CORRECTED  CHARACTERS  TO  THE  COMPUTER. 

TO  SEE  HOW  THE  "BACKSPACE"  WORKS  TRY  TYPING  THE  MOVE  '14D' 

CN  THE  KEYBOARD.  THIN  CHANGE  THE  'D'  TO  A  'U'  BY  PUSHING  THE 
"BACKSPACE"  KEY  ONCE  AND  THEN  THE  CORRECT  LETTER  'U'. 

FINALLY,  PRESS  THE  "RETURN"  KEY  TO  SEND  YOUR  CORRECTED  MOVE  TO 
THE  COMPUTER. 
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Screen  6 


YOU  ARE  NOV  ALMOST  REACT  TO  BEGIN  VORKING.  FIRST*  HOVEYSR ,  VE  NEED 
SOME  INFORMATION  ABOOT  YOU. 

TEE  RESULTS  OF  TEE  PROBLEMS  YOU  VILL  WORE  ON  WILL  BE 
STRICTLY  CONFICENTIAL.  VS  ARE  INTERESTED  IN  YOU  AS  PART 
CF  A  LARGER  GROUP,  AND  AT  NO  TIME  VILL  YOUR  SCORES  BE 
CONNECTED  VITE  YOUR  NAME. 

BUT  VE  NEED  IDENTIFICATION  SO  TEAT  VE  CAN  EES?  YOUR  ANSVSRS 
SEPARATE  FROM  OTEER  PEOPLE'S  AND  SO  TEAT  VE  CAN  COMPARE  TEE 
RESULTS  OF  TBESI  SCORES  VITE  ANT  OTEER  DATA  CONTRIBUTED  BY 
YOU  AT  AN  EARLIER  OR  LATER  TIME. 

PLEASE  TYPE  YOUR  FIRST  NAME  (JUST  TOUR  FIRST  NAME  THIS  TIME), 

AND  TEEN  "RETURN". 

? 
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Screen  7 


PLEASE  TYPE  YOUR  MIDDLE  INITIAL  (ONE  LETTER  ONLY). 
IF  YOU  DO  NOT  SAVE  A  MIDDLE  NAME,  TYPE  A  "?". 

DON'T  FORGET  TO  PRESS  "RETURN". 

? 

* 


Screen  8 


PLEASE  TYPE  YOUR  LAST  NAME  AND  PRESS  "RETURN". 
? 

* 
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PLEASE  TYPE  YOUR  SII  OR  SEYEN  DIGIT  STUDENT  IDENTIFICATION  NUMBER 
ANT  RETURN  • 

IF  YOU  DO  NOT  REMEMBER  YOUR  IDENTIFICATION  NUMBER  AND  DO  NOT 
HATE  IT  VITH  YOU  CALL  THE  PROCTOR  FOR  A  SUBSTITUTE  IDENTIFICATION 
NUMBER. 

? 
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NOV  VE  VOULD  LIKE  TC  KNOV  A  FEV  THINGS  ABOUT  YOU.  IF 
TEE  QUESTION  DOES  NOT  APPLY  TO  TOU  OR  YOU  DON'T  VANT  TO 
RESPOND,  TYPE  IN  A  QUESTION  MARK  AND  "RETURN". 

PLEASE  TYPE  YOUR  AGE  AND  PRESS  "RETURN”. 

? 
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Screen  11 


VEICH  SEX  ARE  YOU? 

'  1.  FEMALE 
2.  MALE 

TYPE  THE  CORRECT  NUMBER  AND  PRESS  "RETURN". 
? 
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Screen  12 


FLEAS!  TIPS  THE  NUMBER  CORRESPOND  INC  TO  TOUR  TEAR  IN  SCHOOL: 

1.  FRESHMAN 

2.  SOPHOMORE 

3.  JUNIOR 

4.  SENIOR 

5.  GRADUATE  STUDENT 

6.  OTHER 

DON'T  PORGET  TO  PRESS  "RETURN**. 

? 


Screen  13 


LISTED  BELOV  ARE  SEVERAL  OF  THE  COLLEGES  WITHIN  THE  UNIVERSITT. 

1.  COLLEGE  OF  LIBERAL  ARTS  (CLA) 

2.  COLLEGE  OF  AGRICULTURE 

3.  COLLEGE  OF  BIOLOGICAL  SCIENCES 

4.  COLLEGE  OF  BUSINESS  ADMINISTRATION 

5.  COLLEGE  OF  EDUCATION 

6.  GENERAL  COLLEGE 

7.  COLLEGE  OF  HOME  ECONOMICS 

8.  INSTITUTE  OF  TECHNOLOGT 

9.  SCHOOL  OF  FORSSTRT 

10.  UNIVERSITT  COLLEGE 

11.  COLLEGE  OF  VETERIN ART  MEDICINE 

12.  GRADUATE  SCHOOL 

13.  LAV  SCHOOL 

14.  OTHER 

PRESS  THE  NUMBER  OF  THE  SCHOOL  IN  VHICH  TOU  ARE  ENROLLED  AND 
THE  "RETURN"  KET. 

? 
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Screen  14 


WHAT  IS  TOUR  RACE? 

1.  AFRO-AMERICAN  (BLACE) 

2.  MEI ICAN-AMERI CAN 

3.  PUERTO-RICAN 

4.  OTHER  LATIN  AMERICAN 

5.  ORIENTAL  OR  ASIAN-AMERICAN 

6.  NATIVS-AMERICAN  (INDIAN) 

7.  WHITE 

8.  OTHER 

TTPB  THE  NUMBER  THAT  GIVES  TOUR  RACE,  AND  PRESS  RETURN  . 
? 
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IN  WHICH  CAT SCO BT  IS  TOUB  CUMULATIVE  GRADE-POINT  AVERAGE  (GPA)? 

1)  3.76  TO  4.00 

2)  3.51  TO  3.75 

3)  3.26  TO  3.50 

4)  3.01  TO  3.25 

5)  2.76  TO  3.00 

6)  2.51  TO  2.75 

7)  2.26  TO  2.50 

9)  2.01  TO  2.25 

S)  2.00  OR  LESS 

TYPE  THE  CATEGORY  NUMBER  (  "l  THROUGH  "9"  )  AND  PRESS  "RETURN". 
? 
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Screen  16 


YOU  ARE  NOV  READY  TO  TRY  A  PRACTICE  PROBLEM. 

IN  THE  PRACTICE  PROBLEM  AND  THE  ACTUAL  PROBLEMS  TO  FOLLOW 
AN  IMPORTANT  GOAL  IN  TRYING  TO  MAKE  THE  PATTERN  ON  THE  LEFT 
MATCH  THE  PATTERN  ON  THE  RIGHT  IS  TO  DO  SO  WITH  AS 
FEW  MOVES  AS  POSSIBLE.  YOUR  PERFORMANCE  WILL  BE  DETERMINED 
NOT  ONLY  BY  WHETHER  YOU  ARE  ABLE  TO  MATCH  THE  TWO  PATTERNS 
BUT  ALSO  BY  HOW  FEW  MOVES  IT  TAKES  YOU  TO  DO  SO. 

THERE  IS  NO  TIME  LIMIT  ON  ANY  OF  THE  PUZZLES  BUT  TRY  TO 
USB  YOUR  TIME  WISELY  WHILE  STILL  TRYING  TO  USE  AS  FEW  MOVES 
AS  POSSIBLE.  TRY  TO  COMPLETE  EACH  PROBLEM.  IF,  HOWEVER,  YOU 
HAVE  WORKED  A  LONG  TIME  ON  A  SINGLE  PROBLEM  AND  FEEL  YOU  CAN  NOT 
SOLVE  IT  CONTACT  THE  PROCTOR  WHO  WILL  GET  THE  COMPUTER  TO 
PRESENT  THE  NUT  PROBLEM. 

A  SUMMARY  OF  HOW  TO  TYPE  IN  YOUR  THREE  CHARACTER  MOVE 
AS  DESCRIBED  EARLIER  WILL  BE  PRESENTED  WITH  EACH  PUZZLE 
AS  A  REMINDER. 

IF  YOU  HAVE  ANY  QUESTIONS  ABOUT  WHAT  YOU  ARE  SUPPOSED 
TO  DO  CALL  THE  PROCTOR.  OTHERWISE  PRESS  THE  "SPACE" 

BAR  AND  "RETURN"  KEY  TO  BEGIN  TOUR  PRACTICE  PROBLEM. 

7 
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Instruction  and.  Recording  Booklet  for  Perceived  Difficulty  Rating  Studi 


Directions 

Thank  you  for  your  participation.  In  this  study,  you  will  be  asked  to 
sort  certain  puzzles  into  piles  based  on  how  difficult  they  appear  to  you. 
Although  you  will  not  actually  solve  the  puzzles  yourself,  you  will  need  to 
know  how  they  would  be  solved  so  that  you  can  estimate  how  difficult  they 
would  be.  All  the  puzzles  will  be  of  the  type  pictured  here. 


Make  your  moves  in  this  pattern  Try  to  match  this  pattern 
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Figure  1 

The  way  to  solve  these  puzzles  is  to  "move”  the  numbers  in  the  left  pattern 
so  Chat  the  left  pattern  will  match  the  pattern  on  the  right.  A  number 
may  only  be  moved  into  the  blank  square  in  the  left  pattern.  For  example, 
to  solve  this  particular  puzzle  (Fig.  1)  one  must  make  3  "moves"  as  follows: 

Move  1 

First,  by  moving  the  "9"  up  one  square  in  the  left  pattern,  we  obtain 
the  following  new  pattern: 
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Figure  2 


Move  2 


By  moving  the  "13"  up  one  square  in  this  new  pattern  (Fig.  2)  we  obtain 
the  following  pattern: 
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Figure  3 
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Move  3 

Finally,  by  moving  Che  "12"  right  one  square,  we  obtain  the  following 
pattern  which  solves  the  puzzle  since  it  matches  the  original  right-hand 
pattern  in  Fig.  1. 
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Figure  4 


If  at  this  point  you  do  not  understand  how  these  puzzles  are  solved, 
please  contact  the  proctor  before  reading  on. 

You  will  be  presented  with  a  number  of  these  puzzles  of  varying  difficulty. 
Your  task  is  to  study  each  puzzle  and,  keeping  in  mind  how  such  puzzles  are 
solved,  estimate  how  difficult  each  puzzle  would  be.  You  should  do  this  using 
the  following  steps.  You  should  complete  each  step  before  going  on  to  the 
next  seep.  If  you  have  any  questions  don't  hesitate  to  contact  the  proctor. 

Step  1  Sort  of  Puzzles. 

First,  study  each  puzzle  and  place  it  in  one  of  the  six  piles  provided 
by  the  proctor  labelled: 

Very  Difficult.  Difficult.  Somewhat  Difficult.  Somewhat  Easv.  Easy, 

Very  Easy. 

There  is  no  requirement  that  each  pile  contain  a  certain  number  of 
puzzles.  You  may  feel,  for  example,  that  none  of  the  puzzles  fits  the 
description  "somewhat  easy".  Just  place  each  puzzle  in  the  pile  that  you 
feel  provides  the  best  description  of  how  difficult  it  would  be  to  solve 
the  puzzle.  You  should  try  to  make  your  initial  placement  as  accurate  as 
possible  but  you  are  free  to  change  the  location  of  any  puzzle  you  wish  if 
you  change  your  mind  about  its  difficulty.  Remember  that  you  do  not  have  to 
actually  solve  the  puzzles.  Just  study  each  puzzle  long  enough  to  feel 
reasonably  confident  about  which  pile  to  place  it  into. 

A  few  of  the  puzzles  contain  a  puzzle  number  and  the  message  "Provide 
your  reason (s)"  on  the  top.  For  these  puzzles,  you  should  write  down  the 
puzzle  number  shown,  the  pile  in  which  you  placed  it,  and  the  reason(s)  for 
why  you  are  sorting  the  puzzle  into  that  pile.  Use  the  space  provided  just 
below  for  this  purpose. 

For  example,  if  you  feel  the  puzzle  would  be  "very  easy"  to  solve  then 
place  the  card  in  the  "very  easy"  pile  and  explain  why  you  think  it  would  be 
"very  easy"  to  solve  next  to  the  puzzle  number  on  the  Data  Sheet.  Do  not 
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just  write  a  reason  like  "Because  it  Is  solved  very  easily  or  very  easily 
or  very  quickly."  Explain  how  you  decided  to  would  be  very  easy,  that  is, 
on  what  basis  did  you  decide  to  sort  it  into  the  "very  easy"  pile. 


Provide  your  reason (s) 


Puzzle  Number 


Assigned  Pile 


Reason(s)  for  sorting  into  the  Pile  you  Did 


Step  2  -  Record  sorting  results 


Each  puzzle  card  has  a  number  on  the  back.  When  you  have  finished 
sorting  the  puzzles  into  the  6  piles  list  these  numbers  under  the  appropriate 
label  below.  There  is  no  required  number  of  puzzles  for  any  category. 


Very  Difficult 

Difficult 

Somewhat  Difficult 

Somewhat  Easy 

Easy 

Verv  Easy 

Step  3  -  Subdividing  the  6  piles 


Examine  the  puzzles  you  have  sorted  into  each  pile  in  Step  2.  You  may 
feel  that  not  all  puzzles  in  a  given  pile  seem  equally  difficult  to  you  even 
though  they  can  all  be  described  as  "very  difficult",  or  "somewhat  easy"  for 
example.  If  you  feel  this  is  the  case,  subdivide  the  puzzles  within  each  of 
the  original  piles  into  as  many  smaller  sub-piles  representing  different 
degrees  of  difficulty  as  you  can.  Only  create  more  subpiles  if  you  feel 
you  can  distinguish  differences  in  difficulty  between  the  puzzles  in  a  given 
pile.  If  you  cannot  differentiate  the  difficulty  of  the  puzzles  within  a 
given  pile  then  do  not  subdivide  the  pile  any  further.  Continue  subdividing 
the  piles  until  you  can  no  longer  differentiate  the  difficulty  of  the  puzzles 
in  each  pile.  During  this  step  you  should  only  compare  and  subdivide 
puzzles  within  each  of  the  original  six  piles  separately.  Do  not  switch 
puzzles  from  one  of  the  original  6  piles  to  another  one,  for  example,  from 
"Easy"  to  "Very  Easy". 

If,  when  you  have  completed  this  step  you  have  been  able  to  subdivide 
any  of  the  original  6  categories,  list  the  card  numbers  in  each  pile  in  the 
space  provided  below.  When  you  list  the  subpiles  always  put  the  hardest  puzzles 
within  a  category  in  subpile  1,  the  second  hardest  puzzles  in  subpile  2,  and  so  on. 


O'* 


Very  Difficult 

Difficult 

Somewhat  Difficult 

Somewhat  Easv 

Easy 

Very  Easy 

subpiles 

subpiles 

subpiles 

subpiles 

sub. 

subpiles 

1  2  ... 

12  ... 

1  2  ... 

1  2  ... 

I  ••• 

I  I  ••• 

Step  4 


Please  answer  Che  following  questions  as  completely  as  possible. 

1.  Your  name  _ 

2.  Your  student  identification  number  _ 


3.  Before  today,  how  often  had  you  tried  to  solve  the  kind  of  puzzle  you 
were  asked  to  estimate  the  difficulty  of  in  this  study? 

a.  never 

b.  a  few  times 

c.  many  times 

4.  How  much  difficulty  did  you  have  understanding  what  you  were  supposed 
to  do  in  this  study? 

a.  no  difficulty 

b.  a  little  difficulty 

c.  much  difficulty 

5.  When  you  sorted  the  puzzles  into  the  original  6  categories,  did  you  use 
any  "rules"  or  criteria  for  sorting  something  into  "very  difficult", 
"difficult",  "somewhat  difficult",  "somewhat  easy",  "easy",  and  "very  easy"? 

YES  NO 

If  so,  what  were  they? 

Very  difficult  - 
Difficult  - 
Somewhat  difficult  - 


Somewhat  easy  - 
Easy  - 
Very  easy  - 
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6.  If  you  were  able  Co  subdivide  Che  original  6  piles  in Co  more  piles  in 
SCep  3,  on  whac  basis  did  you  do  so?  Thac  is,  how  did  you  decide  which 
puzzles  wichin  a  pile  were  more  difficulc  Chan  ochers? 


7.  If  you  did  noc  subdivide  any  of  che  original  6  piles,  cry  Co  explain 
why  you  could  noc  do  so. 


8.  How  ofcen  did  you  use  each  of  che  following  consideracions  in 


deciding  how  difficult  a  puzzle  would 

be: 

a.  The  number  of  "moves"  required 

Co  solve  che  puzzle 

All 

Most 

Some 

None  (  of 

the  puzzles) 

b.  The  number  of  "numbers"  which 
did  noc  match  in  che  two  patterns 

All 

Most 

Some 

None 

It 

c.  Whether  in  one  of  che  patterns 

Che  numbers  were  in  numeric 
order  from  1  to  15 

All 

Most 

Some 

None 

It 

d.  How  far  apart  certain  numbers 
were  in  che  two  puzzles 

All 

Most 

Some 

None 

It 

e.  The  number  of  rows  in  Che  two 
patterns  Chat  did  not  match 

All 

Most 

Some 

None 

It 

f.  The  location  of  che  "empty 
space"  in  Che  left  pattern 

All 

Most 

Some 

None 

It 

g.  The  number  of  columns  in  che 
two  puzzles  that  did  not  match 

All 

Most 

Some 

None 

It 

h.  Whether  you  could  "see"  the 
actual  sequence  of  moves  that 
would  be  needed  to  solve  the 
problem 

All 

Most 

Some 

None 

It 

1.  The  amount  of  time  it  would 
take  to  solve  the  problem 

All 

Most 

Some 

None 

It 

9.  Did  Che  lengch  of  this  sCudy  affect  your  ability  to  perform  che  casks 
required  ? 

a.  not  at  all 

b.  somewhat 

c.  quite  a  bit 


How  did  you  feel  abouc  working  on  this  study? 

a.  I  disliked  it  a  lot 

b.  I  disliked  it  somewhat 

c.  I  felt ‘neutral  about  it 

d.  I  enjoyed  it  somewhat 

e.  I  enjoyed  it  a  lot 

Any  further  comments? 


of  Perceived  Difficult 

The  number  of  moves  required  to  solve  Che  puzzle  or  an  explication  of  Che 
actual  moves  needed: 

"It  only  cook  a  few  moves” 

"The  '12'  and  '13'  will  go  around  corner  into  place,  others  look 
like  they  will  move  easily” 

Whether  subject  could  "see”  the  actual  sequence  of  moves  that  would  be 
needed  to  solve  the  problem  (no  number  of  explication  of  the  moves 
provided) : 

"I  can  work  this  out  just  at  a  glance — its  obvious" 

"I  see  logical  moves” 

The  number  of  squares  ("numbers")  which  did  not  match  in  the  two  patterns: 
"All  numbers — same  location,  except  for  '3*  in  bottom  right  hand 
corner" 

"I  only  had  to  deal  with  5/16  of  the  digits” 

The  amount  of  time  it  would  take  to  solve  the  problem: 

"Took  10  seconds  to  solve” 

"Took  a  while  to  see  the  pattern” 

The  type  of  moves  required  co  solve  the  puzzle: 

"Some  complicated  moves  must  be  made” 

"Tricky  or  misleading  moves” 

"Needed  a  combination  of  movements  of  sets  of  numbers  including 
moving  number  that  was  in  correct  spot  to  allow  for  other 
movements,  then  replacing  at  end" 

How  far  apart  certain  numbers  were  in  the  two  puzzles: 

"Don’t  move  numbers  very  far" 

"Numbers  in  some  cases  move  a  great  distance" 

How  much  thought  was  required  to  solve  the  problem: 

"Required  lots  of  thought" 

"I  had  trouble  keeping  all  the  moves  in  my  head" 

The  number  of  columns  not  matching  in  the  two  patterns: 

"Because  you  only  have  to  deal  with  two  of  the  four  columns" 

The  number  of  rows  not  matching  in  the  two  patterns: 

"Two  rows  match  already” 

The  location  of  the  'empty  space'  in  the  left  pattern: 

"Will  require  using  the  right  columns  because  it  contains  the 
open  space" 

Similarity  to  an  already  solved  or  rated  puzzle: 

"This  puzzle  easier  since  it  resembles  one  already  solved” 

Whether  either  the  left  or  the  right  pattern  was  in  numeric  order  from 
1  to  15: 

there  were  no  examples  of  this  dimension  in  the  voluntary  protocols 
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1  3r .  Week  X.  8orsting 
Provost  s  Acadaaic  Doan 
II. S.  Uvai  Postgraduate  Sensei 
Monterey,  DA  939*0 

t  csmmavm  a.pepse»  'n-6« 

Dept.  of  Navy 
'Washington.  DC  20379 
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i  Or.  Richard  Elnar 

Oeoartaent  of  Administrative  Sciences 
Kavsl  Postgrnduata  School 
Monterey,  CA  939*0 

t  0*.  PAT  FEDERICO 

NAVY  PERSONNEL  PAS  0  SITES 
SAN  DIEGO,  CA  92192 

t  k r.  Paul  Foley 

layy  Personnel  MO  Cancar 
San  Diego.  CA  92192 

1  3 r.  Jonn  Fora 

Navy  Paraonnal  PAO  Cmnfr 
San  31 ago,  CA  92192 

1  Sr,  Patrick  P.  Harrison 
Psychology  Course  01  r actor 
L3ACEHSHIP  1  LAW  DEPT.  (7S> 

3IV.  r.f  PROFESSIONAL  OEVELOPMMENT 
J.3.  NAVAL  ACADEMY 
ANNAPOLIS.  (18  21*02 

1  Dr.  (toman  J.  Karr 

Cilof  of  ttaaal  Taehnlcal  Tralntn* 

(lavai  Air  Station  “eupnij  (79) 
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1  3r.  Leonard  Kroakar 

Navy  Personnel  KAO  Canter 
San  Cl ago.  CA  92192 

1  3r.  '.Jiuia*  delay 

Principal  Civilian  Advisor  for 
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Pensacola.  FL  3290P 

l  Sr.  Kneale  Carsnall 

Scientific  Advisor  to  OCNOOiPT) 

0P01T 

Washington  X  20370 

1  CAPT  Plenaro  1.  Martin.  U3N 
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1  Or.  Danes  (icerida 
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San  Diego.  CA  92192 
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Naval  Sutnanna  Kauical  Research  la* 
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l  Dr  .'111  tan  Montague 

Navy  Personnel  PAO  Canter 
San  Diego.  CA  92192 


t  Or.  Will lan  noonan 
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Navy  Personnel  PAO  Canter 
San  01 ago,  CA  92192 

1  IS 98  W.  Moroney 
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1  Do«i ending  Officer 
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San  Olago,  CA  92138 
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Technical  Deformation  Office.  Cade  201 
NAVY  PERSONNEL  HAD  CSTTES 
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Navy  Personnel  PAO  Cancar 
San  Olago.  CA  92192 

9  Technical  Director 
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San  Diego,  CA  92192 

1  Director,  Navy  Personnel  PAO  Canter 
Washington  Uaaon  Office 
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'Washington  Navy  Yard.  X  20  37* 

S  Co— ending  Offlctr 

Naval  Pasaarcn  Laboratory 
Seda  2627 

'Washington.  X  20390 

1  Psychologist 

ONI  Srancn  Gfflee 
31  dg  11*.  Section  D 
666  Sunmsr  Street 
Boston,  ma  02210 

1  Psychologist 

QN8  Prsncn  Cfflee 
936  S.  Clark  Street 
Chicago ,  CL  60609 

1  Office  of  Naval  Pasaarch 
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300  N.  SUlncy  S3 treat 
Arlington,  VI  22217 

9  Personnel  A  Trsining  1e search  Prograis 
(Code  *98) 

Office  of  Naval  Ptscarcn 
Arlington.  7A  22217 

1  Psysnologist 

CNR  Srancn  Office 
lfl30  Seat  Srven  Street 
Pasadena,  CA  31101 

1  Scientific  Director 

Office  of  Naval  Pasaarcn 
Scientific  Liaison  Sroue/Toxyo 
la  aria  an  Da  sassy 
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i  Special  Asst,  for  Education  sad 
Training  COP-OIE) 
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Washington,  X  20370 
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Pasaarcn.  Development,  ana  Studies  8rsnc 
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Washington.  X  20390 
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Arlington  Annas 
Washington.  X  20390 
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Pooa  »A*T8.  The  Pentagon 
Washington,  X  20390 

1  Captain  Oonald  F.  Paricar.  U3N 
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Navy  Personnel  PAO  Canter 
Eon  Olago.  CA  92192 
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Pensacola.  FL  32903 
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Naval  Postgraduate  School 
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Monterey.  CA  939*0 
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ARLINGTON  INSET 
WASHINGTON.  X  20350 

1  Or.  Robert  Wisher 
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Navy  Personnel  PAO  Center 
San  Olago.  CA  92192 

1  DP.  MARTIN  F.  WISKOFT 
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SAP  DIEGO.  DA  92192 
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Navy  Recruiting  Coansno 
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Or.  3ernard  Rlalano  C 339)  t 

Navy  Personnel  RAO  Center 
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Chief  of  ‘lavtl  Education  im  Training 
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Dr.  RoDert  2.  Saiui 

Office  or  au«r  or  Naval  Operations 
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5001  Cloannouar  Avanua 
Alexandria,  VA  22333 
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U.S.  ARMT  RESEARCH  INSTITUTE 
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nr.  Rddart  Roaa  i 
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Raaaaren 

Ufa  Sdleneea  Slractorate.  ML 
Soiling  Air  Force  Sim 
Loaning  too,  X  2C332 

Sr.  Earl  A.  Allulai 
HQ,  ATHRL  (AFSC1 
9rooka  AFB,  TX  73235 
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U.S.  Air  Force 
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Jack  A.  Thorpe,  Maj..  JSAF 
Naval  war  Collage 
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Brian  E.  Waters.  Lt  Cal.  U3AF 
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H.  wtlllan  Greenup 
Education  Advisor  (SC3U 
Education  Canter .  MC2EC 
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HQ.  Marino  Carps  (NPU) 
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Headquarters.  U.  S.  Manna  Carps 

Coda  MPI-20 

weaning tan,  X  20380 

Special  Assistant  Tar  Marine 
Carps  Matters 
Coda  100H 

Office  sf  Naval  Raaaaren 
300  N.  Ouiney  St. 

Arlington.  VA  22217 

Major  Mlcnael  L.  Patrow,  USMC 
Head quarters,  Marina  Carpa 
( Cooa  MPI-20) 

Wanning tan,  X  20330 

OR.  A.L.  SLAFK03XT 
SCIENTIFIC  AOVT’OR  (CXI  RO-D 
HQ.  U.S.  MARINE  CORPS 
WASHINGTON.  X  20380 
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J.  S.  Coast  Cusrd  Inatltuta 
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’2  3>r«nM  Docimsntatlon  Center 
Cimmron  Station,  Bldg.  5 
Alexandria,  VA  2231* 

Attn:  TC 

l  Op.  3*i car  fl  ate  nar 

ADVANCED  RESEARCH  PROJECTS  AGENCT 
1*00  WILSON  BL/D. 

ARLINGTON,  VA  22209 

i  Dr.  William  Graham 
Testing  Dlraetorata 
MEPCCH/MEPCT-P 
Ft.  Sheridan,  IL  60037 

<  Dirac  tor,  Rasaarcn  and  Data 
DASO(MRAiL) 

38919,  Tha  Pontafon 
Washington.  X  20 301 

1  Military  Aasistant  for  Training  and 
Personnel  Technology 

Offlaa  of  tha  (Aider  Secretary  of  Defense 
for  Research  A  Engineering 
Room  3D  129.  The  Pentagon 
Washington,  DC  20301 

’  MAJOR  Jayne  Sellsan.  U3AF 

irfiae  of  the  Assistant  Secretary 
of  Defense  (KRAAL) 

'9030  The  Pentagon 
Washington.  DC  20301 


Civil  Govt 


t  Dr.  Lorraine  D.  Eyde 
Personnel  US  Center 
'fftce  of  Personnel  Management  of  USA 
•-,0C  -Street  MW 
Washington,  O.C.  2<i»i5 

:  Jerry  Lohnus 

REGIONAL  PSTCHOLOGIST 

J.3,  Office  of  Personnel  Management 

230  S.  DEARBORN  STREET 

CHICACO.  IL  6060a 

'  wiillma  J.  McLaurin 

Rm.  301.  Internal  Revenue  Service 
222 ’  Jefferson  Oovls  Hlgnwey 
Arlington.  VA  22202 

1  Dr.  Andrew  R.  Nolnar 
Science  education  Dev. 
ami  Research 

Motional  Science  Foundation 
Jasnington,  DC  20650 

Personnel  RAO  Canter 
Cfflce  of  Personnel  Mansgment 
i MOD  E  Street  MW 
Washington,  X  2C»15 

Dr.  H.  .'allece  Slnalko 
Program  Director 

“tnpower  Research  and  Advisory  Services 
Smithsonian  Institution 
am  Mortn  Pitt  jtreot 
Aleiandna.  VA  223'* 


1  Or.  Vom  'W.  urry 

Personnel  RAD  Cantor 
Office  of  Personnel  Management 
1900  E  Street  MW 
Washington,  X  20A15 

i  Dr.  Joseph  L.  Toung,  Director 
Memory  1  Cognitive  Processes 
Motional  Sclanea  Foundation 
Washington,  K  20550 


Mon  Govt 


1  Or .  Erllng  B.  Anderson 
Department  of  Statistics 
Studlastraada  6 
i»55  Copanhagan 
DENMARK 

1  1  psychological  raaaarch  unit 

Dapt.  of  Defense  (Army  Office) 
CaapOell  Park  Officas 
CanBarra  ACT  2600,  Australia 

1  Ms.  Carols  A.  Baglay 

Minnesota  Educational  Computing 
Consortlua 

235*  Hidden  valley  Lane 
Stillwater,  MN  55082 

1  Or.  Jackson  Beatty 

Department  of  Psychology 
University  of  California 
Los  Angelas ,  CA  9002 « 

t  Or.  Isaac  Bejar 

Educational  Tasting  Service 
Princeton,  NJ  (18*50 

1  OezWPs  to  Strettwsefteamt 
Poatfach  2h  ;o  03 
0-5300  Bonn  2 
WEST  TERM  ANT 

1  f  .  Ntcnolas  A.  Bond 
Dapt.  of  Psychology 
Sacramento  State  College 
500  Jay  Streat 
Sacramento,  CA  95819 

1  Dr.  Lyle  Bourne 

Department  of  Psychology 
University  of  Colorsdo 
Boulder ,  CO  .90309 

I  Or.  NoDert  Brennan 

American  Collage  Testing  Prograa 

P.  C.  Boa  169 

Iowa  City.  IA  522*0 


1  DR.  C.  VICTOR  3UNDERS0N 
WICAT  INC. 

UN  I  VERS  ITT  PUU.  SUITE  10 
1160  SC.  STATE  ST. 

DREW,  UT  3*067 

1  Or.  Jann  3.  Carroll 
Psychometric  Lie 
univ.  of  'lo.  Carolina 
Davie  Hall  0'3» 

Chapel  Hill,  MC  275'* 


1  Charles  Myers  Library 
Livingstone  House 
Livingstone  Road 
Stratford 
Lind  on  E15  2LJ 
ENG  LINO 

1  Dr.  Kennetn  E.  Clark 

Collage  of  Arte  5  Sciences 
University  of  Aocneater 
River  Campus  Station 
Rochester .  NT  1*627 

1  Dr.  Mo  naan  Cliff 
Dapt.  of  Psychology 
Univ.  of  So.  California 
University  Park 
Los  Angelas.  CA  90007 

1  Or.  William  E.  Coffman 

Director,  Iowa  Tasting  Progrmaa 
33*  Lindquist  Canter 
University  of  Iowa 
Iowa  City.  IA  522*2 

1  Or.  Meredith  P.  Crawford 

American  Psychological  Association 
1200  17th  Street.  N.W. 

Jaanington,  X  20036 

1  Dr.  Kenneth  B.  Cross 
Anacapa  Sciences,  Inc. 

P.0.  Drawer  <3 

Santa  Barbara.  CA  33102 


1  ERIC  Facillty-Ac  quint  Iona 
*833  Rugby  Avenue 
Bethesda.  HO  2091* 

1  Dr.  Leonard  Faldt 

Lindquist  Canter  for  Measurment 
University  of  Iowa 
Iowa  City.  IA  522*2 
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